We have been developing rapid deployment of unrestricted speech-to-speech translation between new language pairs, using MT techniques primarily developed during the Pangloss project.
Current activity is as part of the Lockheed-Martin-led Tongues project.
(See the May 2000 issue of Wired. There's quite a bit on the LTI, and Diplomat got 1.5 columns itself, starting at the bottom of one page (and going onto a second page)).
Unrestricted translation is achieved through user-driven incremental improvement: the user can add new words and phrases during use.
Rapid deployment is achieved through the use of Pangloss's EBMT and
transfer-based MT, within the Multi-Engine MT architecture, which uses
a statistical target language model to help select between competing
translations. This technology also makes the system's user-driven incremental improvement possible.
The speech components of the system are the SPHINX HMM speech
recognizer and a concatenation-based speech synthesis system developed
by Kevin
Lenzo.
The system was originally developed on
laptops with WaveLANs, with ports to first
one wearable platform and then
another.
We planned to eventually include multi-lingual field OCR, for scanning in and translating documents in the field.
We actually developed a prototype of this for Haitian Creole, in
conjunction with
ARL's
FALCon system.
We have also used
OCR for language resource development.
As you can see, this is a cooperative effort among a large number
of projects within CMU SCS,
headquartered at the LTI.
Our first test case was Serbo-Croatian/English. We deployed an initial system within three weeks of the start of the project. We produced Korean, Haitian Creole, and Spanish prototypes, and did some investigation of Arabic.
DIPLOMAT is an acronym.
So far, the best reference to cite is:
Frederking, R., Rudnicky, A., and Hogan, C. Interactive Speech Translation in the DIPLOMAT Project. Presented at the Spoken Language Translation workshop at the 35th Meeting of the Association for Computational Linguistics, ACL-97. Madrid, Spain. 1997.