Developing Machine Translation for ASL

Though automatic translation engines like Google Translate are far from perfect, they have become useful tools to help individuals communicate, particularly for high-demand language pairs like English and Spanish. However, machine translation for signed languages like American Sign Language (ASL) lags far behind spoken and written languages.

That could be changing soon, though—the COVID-19 pandemic has spurred the development of artificial intelligence-based technologies that can translate sign languages into written language. Most recently, an engineering student at the Vellore Institute of Technology in Tamil Nadu, India, went viral on social media for her efforts to develop an AI model that can translate basic ASL phrases into English with high accuracy rates. In a now-viral LinkedIn post, Priyanjali Gupta shared the model, receiving more than 60,000 reactions on the platform.
Gupta’s AI model made headlines on Feb. 15 for its ability to identify simple ASL phrases with accuracy rates hovering around 90% or higher. While it doesn’t work as an all-purpose machine translation model (it can only identify six different phrases right now), Gupta’s model serves as a testament to the increased interest in developing automatic translation for signed languages. She plans to work on expanding the model to improve its ability to identify additional signs.

“The data set is manually made with a computer webcam and given annotations. The model, for now, is trained on single frames,” Gupta told Interesting Engineering. “To detect videos, the model has to be trained on multiple frames, for which I’m likely to use LSTM. I’m currently researching on it… I’m just an amateur student but I’m learning. And I believe, sooner or later, our open-source community, which is much more experienced than me, will find a solution.”

Unlike written languages, machine translation for signed languages requires a given model to be capable of identifying specific gestures—that is, the placement, shape, and movement of an individual’s hands—with high precision.

This means developers must have knowledge about computer vision in addition to their knowledge about machine translation and sign language. As a result, it’s more difficult than developing machine translation for written languages, which generally have a standardized set of already-digitized characters.
AW