Lip-reading computer outperforms humans

Lip reading montage

Lip-reading computer outperforms humans

A Catz DPhil student, in collaboation with Google's DeepMind, has developed an artificial intelligence system which can read the movements of a person’s mouth better than professional, and human, lip-readers. The news was picked up by the BBC technology blog this week.

Joon Son Chung (2014, DPhil Engineering Science), the creator of the ‘Watch, Attend and Spell’ (WAS) model has trained the AI system using thousands of hours’ worth of BBC News broadcasts. The videos contained more than 118,000 sentences in total, and a vocabulary of 17,500 words, from programmes including Newsnight, Breakfast and Question Time. It has accurately interpreted around 50 percent of the clips that it has viewed. To put this into context, professional lip-readers viewing the same footage only got 12 per cent of the words correct.

“‘Lip-reading is an impressive and challenging skill”, explains Joon Son. “WAS can hopefully offer support to this task - for example, suggesting hypotheses for professional lip readers to verify using their expertise. There are also a host of other applications, such as dictating instructions to a phone in a noisy environment, dubbing archival silent films, resolving multi-talker simultaneous speech and improving the performance of automated speech recognition in general.’”

There is still scope for improvement, but Joon Son is hopeful that the technology can be developed further. “Currently the system only works with pre-recorded video content with words it is familiar with. We hope to develop it to the point where it will be able to respond to new video in real time. For example, it could potentially be used for live TV subtitles.”

To follow Joon Son’s progress, visit his homepage.