As stragglers settle into their seats for general biology class, real-time captions of the professor’s banter about general and special senses – “Which receptor picks up pain? All of them.” – scroll across the bottom of a PowerPoint presentation displayed on wall-to-wall screens behind her. An interpreter stands a few feet away and interprets the professor’s spoken words into American Sign Language, the primary language used by the deaf in the US.
Except for the real-time captions on the screens in front of the room, this is a typical class at the Rochester Institute of Technology in upstate New York. About 1,500 students who are deaf and hard of hearing are an integral part of campus life at the sprawling university, which has 15,000 undergraduates. Nearly 700 of the students who are deaf and hard of hearing take courses with students who are hearing, including several dozen in Sandra Connelly’s general biology class of 250 students.
The captions on the screens behind Connelly, who wears a headset, are generated by Microsoft Translator, an AI-powered communication technology. The system uses an advanced form of automatic speech recognition to convert raw spoken language – ums, stutters and all – into fluent, punctuated text. The removal of disfluencies and addition of punctuation leads to higher-quality translations into the more than 60 languages that the translator technology supports. The community of people who are deaf and hard of hearing recognized this cleaned-up and punctuated text as an ideal tool to access spoken language in addition to ASL.
Microsoft is partnering with RIT’s National Technical Institute for the Deaf, one of the university’s nine colleges, to pilot the use of Microsoft’s AI-powered speech and language technology to support students in the classroom who are deaf or hard of hearing.
“The first time I saw it running, I was so excited; I thought, ‘Wow, I can get information at the same time as my hearing peers,’” said Joseph Adjei, a first-year student from Ghana who lost his hearing seven years ago. When he arrived at RIT, he struggled with ASL. The real-time captions displayed on the screens behind Connelly in biology class, he said, allowed him to keep up with the class and learn to spell the scientific terms correctly.
Now in the second semester of general biology, Adjei, who is continuing to learn ASL, takes a seat in the front of the class and regularly shifts his gaze between the interpreter, the captions on the screen and the transcripts on his mobile phone, which he props up on the desk. The combination, he explained, keeps him engaged with the lecture. When he doesn’t understand the ASL, he references the captions, which provide another source of information and the content he missed from the ASL interpreter.
The captions, he noted, occasionally miss crucial points for a biology class, such as the difference between “I” and “eye.” “But it is so much better than not having anything at all.” In fact, Adjei uses the Microsoft Translator app on his mobile phone to help communicate with peers who are hearing outside of class.
“Sometimes when we have conversations they speak too fast and I can’t lip read them. So, I just grab the phone and we do it that way so that I can get what is going on,” he said.
AI for captioning
Jenny Lay-Flurrie, Microsoft’s chief accessibility officer, who is deaf herself, said the pilot project with RIT shows the potential of AI to empower people with disabilities, especially those with deafness. The captions provided by Microsoft Translator provide another layer of communication that, in addition to sign language, could help people including herself achieve more, she noted.
The project is in the early stages of rollout to classrooms. Connelly’s general biology class is one of 10 equipped for the AI-powered real-time captioning service, which is an add-in to Microsoft PowerPoint called Presentation Translator. Students can use the Microsoft Translator app running on their laptop, phone or tablet to receive the captions in real time in the language of their choice.
“Language is the driving force of human evolution. It enhances collaboration, it enhances communication, it enhances learning. By having the subtitles in the RIT classroom, we are helping everyone learn better, to communicate better,” said Xuedong Huang, a technical fellow and head of the speech and language group for Microsoft AI and Research.
Huang started working on automatic speech recognition in the 1980s to help the 1.3 billion people in his native China avoid typing Chinese on keyboards designed for Western languages. The introduction of deep learning for speech recognition a few years ago, he noted, gave the speech technology human-like accuracy, leading to a machine translation system that translates sentences of news articles from Chinese to English and “the confidence to introduce the technology for every-day use by everyone.”
Continue onto Microsoft’s Blog Room to read the complete article.