Work Could Extend Reach of Video Conferencing
A researcher at Carnegie Mellon University and Karlsruhe Institute of Technology who has pioneered speech translation technologies dove to the wreck of the Titanic to test his latest work in the field.
From inside a submersible 13,000 feet beneath the North Atlantic, Alex Waibel recorded himself both narrating his dive and talking with the pilot as they journeyed to the legendary wreck. He then used speech recognition technology to convert that speech to text, and he transmitted selected messages to the surface via sonar. On the ship, new technology that Waibel and his team developed resynthesized the text to video, displaying it as a video chat that used Waibel's voice and showed his lips moving in sync with the words.
Under water, particularly salt water, radio signals do not work and communication with submersibles resorts to low-bandwidth sonar signals. Messaging is mostly unavailable or limited to text only. When scientists dive to the Titanic, they can only communicate with the surface through text messages sent via sonar. It is a slow process that can be irritating and adds to the workload of the crews in the submersible and aboard the ship.
"By interpreting and recreating natural voice communication, we are trying to reduce the workload of scientists and pilots in such missions in a natural way, despite the challenges imposed by salt water, operational stress, conversational dialogue and poor acoustic condition," said Waibel, a professor in CMU's Language Technologies Institute and at the Karlsruhe Institute of Technology.
Waibel and his team developed synthetic methods that reconstruct video from text. The video features a synthesized voice that adapts to sound like that of the speaker, and the speaker's lips move in sync with the generated audio. The method is intended for scenarios where video conferencing is required over low-bandwidth transmission or where intermittent connectivity leads to poor quality video feeds, dropout, disconnects or the absence of video conferencing altogether. The method could also be used for synthesizing video in another language for video dubbing. The research is performed at Karlsruhe Institute of Technology and CMU's School of Computer Science in collaboration with Zoom Video Communications. Waibel is a Zoom research fellow and advises the company's AI research and language technology development.
Waibel tested the technology during his dive to the Titanic on July 14 as a member of an OceanGate Expeditions voyage to the famous wreck. During this first dive, Waibel had to proceed cautiously and selectively. The wreck rests at extreme depths and under strong currents, and communication with the surface is mission critical. Waibel carried a laptop running advanced speech recognition software in the submersible. A selection of short text fragments were successfully transmitted via sonar to the surface and resynthesized on the ship with lip-synchronous video output. Others were recorded and processed later. With more experience, the technology could help reduce workload and improve the naturalness of communication.
"It is as if we can now carry out video conferences from the abyss," Waibel said.
The technology Waibel tested at the wreck of the Titanic builds on decades of his pioneering research in speech translation. Waibel and his collaborators demonstrated the first consecutive and simultaneous translation systems, developed the first commercial speech translation system on a mobile phone, and created the first simultaneous lecture interpretation service. He has also developed dialogue translators for humanitarian missions and interpretation support for the European Parliament. In 2021, Zoom acquired a machine translation company Waibel co-founded to bolster the platform's real-time translation capabilities.