Imagine you are deaf. You communicate with other deaf people by sign language, but with hearing people? All you can do is read lips or write. Wouldn’t it be great if you could communicate with them in your language? In order to address this problem, the European Union’s Horizon 2020 research and innovation programme has funded the Content4All project, under grant agreement N°762021. The goal of the project is to set up a new frontier in automate sign language interpreting with novel technologies and AI algorithm.
According to the UN Convention on the Rights of Persons with Disabilities, television content should be accessible to the deaf. Consequently, regulators are requesting more signed content. The UN Convention, however, does not only apply to television, but also to other areas such as education, events, business and politics. This means that we must create an ecosystem for access to information, education and culture.
Collecting Data is Key
“For us, the continuous improvement process is important. The key here is collecting data,” says Robin Ribback. SwissTXT, one of the partner of the project consortium, continuously collects this data from its mandates in broadcasting, education, events, business, and politics. As a result, the AI is constantly improved by means of deep learning. “At present, people still play a major role in accessibility. However, we are constantly improving our data so that the automatic systems can take over more. At some point everything will work automatically”; this means that the goal to provide 100% signed content to deaf people – always and everywhere – can be achieved in a not-so-far future, the Content4All consortium is convinced.
Of course, such a system should not only be available for education and television, but also for events, companies and any political debate. For example, in addition to other translations, meetings of national Parliaments should also be provided in sign language, while at some events the subtitling of stadium speakers is already done today. For example, people with hearing disabilities can follow the commentary of the stadium speaker at FC Bayern Munich matches in the stadium with AR glasses.
How spoken language becomes sign language
“Deaf people want sign language translations for all the content,” says Michaela Nachtrab, business developer for Access Services at SwissTXT and responsible for the exploitable Innovation assets of Content4All. She herself is a sign language interpreter and teacher for the Deaf. “They want to communicate in their natural mother tongue.” This is not as simple as with subtitles. Sign language is a language in which several factors play a role in comprehension. In sign language, for example, the sign itself is as much important as the upper body and facial expressions. “Even small movements in the face can make a difference. For example, when I raise my eyebrows and look down, I formulate a question,” says Michaela Nachtrab. And with the upper body, for example, positions are represented.
For this to succeed, a knowledge base of sign language interpreters should be built: an avatar, so to speak. “Avatar is often equated with gaming or low quality. That’s why we call it Realatar, a photorealistic virtual human in 3D of high quality and details”, says Giacomo Inches, Fincons Group, Project Coordinator of Content4All.
To create a Realatar, Content4All follows the same procedure as for subtitling. First, interpreters are recorded in a special studio and their digital replica is created. The Realatar generated in this way can be transferred and displayed to devices: TV, notebooks or tablets. As in the example with the re-speaker, the interpreters can do their work from anywhere. All that is needed is a camera that films her face and a sensor like the Microsoft Kinect that perceives her movements.
“This is currently the first phase we are in, which is called Live Remote Avatar Puppeteering and we are first applying it to the broadcasting industry” says Giacomo Inches. “This may sound like little, but it allows the interpreters to do their work from anywhere, even from home and creates more working opportunities for interpreters, lowering the costs for the broadcasters”. Now it’s time to collect more motion and facial expression data: “Until now, mankind has missed recording the optical motion data of sign language interpreters,” says Robin Ribback. “Speech recognition has been collecting data since 1987. Databases for sign language are only now being set up”.
The first step into building a sign language database is to collect data for news and weather forecasts. This is because the language repertoire for weather forecasts is relatively limited and clear. This means that there is little room for false recognition. In speech understanding, people understand about 99 percent correctly and machines only about 85 percent. With sign language, the value of machines drops dramatically. If a machine recognizes 55 percent correctly, that is already a lot. To understand speech, however, at least 90 percent correct speech recognition is required. After that, analogous to subtitling, sign language interpreting is to be automated in the three phases: the AI will be trained with the generated database and will apply Natural Language Processing (NLP) and Deep Learning algorithms, so that the spoken language is recognized, converted into gestures and expressions of sign language which the Realatar can execute. Finally, thanks to HbbTV technologies, the Realatar is made accessible to people with hearing disabilities via a transparent browser overlay that can be displayed above the original TV signal. This allows subtitles or sign language interpreters to be displayed. And while this is done only for limited domains in the broadcasting industry now, it will become the standard for all the content in all other fields such as education, events, business and politics in a not-so-far future.