My research interests primarily revolve around multilingual processing, with sidelines in text categorization and information extraction:
I have worked primarily on example-based machine translation (EBMT), a data-driven translation approach that originated a few years before statistical machine translation characterized by the use of individual examples from the training corpus during translation. I have also applied my EBMT system to cross-language information retrieval and speech-to-speech translation.
I have worked on reconstructing corrupted ZIP archives and on extracting text in arbitrary encodings from files and raw disk images. As part of this work, I developed language identification for more than 1,300 languages, and am continuing to improve the accuracy with which languages can be identified.
My current work focuses on extracting actions and affected components from aircraft maintenance records.