Sign Language Recognition Using Deep Learning
Abstract
Sign Language Recognition (SLR) remains a big challenge in how people interact with computers and in creating support tools for the Deaf and Hard-of-Hearing (DHH) community. Even though deep learning methods like CNNs and LSTMs have been effective in some ways, they often run into privacy issues when used in systems where data is kept in one place. They also struggle with real-world problems such as changes in lighting, things blocking the view, and the different ways people might sign. This study introduces a new system called Hybrid Vision Transformer-CNN (ViT-CNN) that works with Federated Learning (FL) to fix these issues and keep user data private. The model uses the strong overall understanding of Vision Transformers along with the fine detail capture of CNNs, and it runs on a secure federated learning setup. Our method achieves 98.76% accuracy on ASL datasets and 97.92% on ISL datasets without needing to collect all data in one central location. The federated learning part lets devices work together to improve the model without sharing personal info, which is especially important for communities that are often left out. The paper also introduces a new Multi-Head Spatial Attention (MHSA) method that's designed to work well with the movement and structure of sign language, along with a dynamic data enhancement strategy that adapts to different people's signing styles. We also thoroughly test how well the model performs in tough situations, how fast it can run on local devices, and how well it can be adapted to other sign languages using transfer learning..
Full Text:
PDFRefbacks
- There are currently no refbacks.
Copyright © IJETT, International Journal on Emerging Trends in Technology