About
Education
Research
People
News/Events
Contacts

LTI Seminar Abstracts Spring 2002


May 9, 2002 --Tanja Schultz, LTI Research Associate
Multilinguality in Automatic Speech Recognition Systems


In this talk I will give an overview of my current research and future plans in the investigation of multilinguality issues in automatic speech recognition. In the first part I will describe work which aims at dramatically reducing the amount of effort required to develop a speech-to-text LVCSR engine in a new language. The second part adresses applications of multilingual engines like the identification of non-verbal cues in spoken speech. I will discuss general questions and challenges concerning the development of speech recognizers in many different languages and give examples of some language peculiarities which will explain why speech recognition in many languages requires much more than just retraining with new data.

May 8, 2002 --Alexander Koller, Saarland University
Computational Linguistics and Theorem Proving in a Computer Game


Text adventures are a classical form of computer games which were most popular in the eighties, but have gone somewhat out of fashion since. The player interacts with the game world (e.g. the rooms and objects in a space station) by typing natural-language commands; the computer provides feedback in the form of natural-language descriptions of the world and of the results of the player's actions.

We have developed a text adventure engine based on current technology from computational linguistics and theorem proving. There is a "real" parser and generation system, and a simple algorithm for resolving referring expressions. The game world and the player's knowledge are modelled as description logic knowledge bases and accessed through a DL inference system. These inference services are used in almost every language-processing module.

Our system resolves some of the most annoying limitations of traditional text adventures. It is reasonably modular, and components can be replaced by improved versions. Among other things, this makes it interesting as a testbed for NLP modules and in education.

Alexander Koller is a PhD student at the Special Research Action 378 "Resource-Adaptive Cognitive Systems" Saarland University. His research focuses on inference-based methods for natural language understanding and generation.


May 3, 2002 --Bob Carpenter, SpeechWorks International, Inc.
A Portable, Server-Side Dialog Framework for VoiceXML


We describe a spoken dialog application framework that combines the power and flexibility of server-side Java Servlets and Java Server Pages (JSPs) with the deployment portability, reliability and scalability of standard web (HTTP) servers and VoiceXML clients. Applications are developed by extending a framework of Java classes in order to define dialogs through lower level actions such as speech recognition, audio prompting, speech synthesis, and backend data access. The framework delegates session data management to servlets, embedding frame-based representations for the application's global and session data. Dialog flow is controlled through general constructions such as loops, conditionals, scoped sub-dialogs, along with scoped command, error, and exception handling. Prompting and grammars are configured through simple JSP templates that generate the VoiceXML instructions for the server to return to the client. The framework is designed to be extensible, as demonstrated by the implementation of customizable backup and repeat commands integrated with session data, command handling and grammar scoping. We will further describe a high-level XML-based representation and its compilation to a portable web archive format.

BIO: I work mainly on models of parsing, classification and interpretation for written and spoken language. Recently I've been working on multi-modal interfaces, automatic call routing, VoiceXML dialog frameworks. My current projects involve automatic e-mail parsing and language identification for synthesis. My current theoretical obsession is Kolmogorov Complexity.

I received a Ph.D. from the University of Edinburgh's Centre for Cognitive Science, spent eight years as faculty at CMU in Computational Linguistics, four years in the Bell Labs Multimedia lab, and the last two years at SpeechWorks in the Dialog Product Group.

I race vintage motorcycles (pre 1930) cross-country on summer weekends. During the week, I can be found playing the cello downtown in Roberto's and my post-modern rock ensemble, "Port 8080" (he plays analog synth). Look for our first album-length release, "Dynamic Programming", on Thrill Jockey records this summer, featuring the club hit, "Maximum Entropy".


April 26, 2002 --Gareth J. F. Jones, Department of Computer Science, University of Exeter, U.K.
An Investigation of Mixed-Media Information Retrieval


Growth in the availablity of electronic documents derived from sources other than typed text has led to significant interest in areas such as Spoken Document Retrieval and scanned Document Image Retrieval. These existing studies have investigated topics such as issues of inaccurate document recognition and indexing, and methods to compensate for these in retrieval. These studies have all been conducted on the basis that documents within a collection are all of one media type. Thus a user might be required to search separately within a collection for each media to find information of interest. This presentation will describe an investigation into Mixed-Media Information Retrieval where the document collection is composed of a mixture of electronic text, automatically transcribed spoken documents and document images indexed using OCR. Experimental results compare retrieval from separate parallel collections from the 3 media sources, and report retrieval results from mixed-media collections where each document is present from only one of the media sources. Results for this latter investigation concentrate on analysing the retrieval behaviour of the documents from each source within the mixed-media collection.


March 22, 2002 --Talk 1: Judith Klein-Seetharaman
Comparative n-gram analysis of whole-genome protein sequences


M. Ganapathiraju, D. Weisser, R. Rosenfeld, J. Carbonell, R. Reddy & J. Klein-Seetharaman

A current barrier for successful rational drug design is the lack of understanding of the structure space provided by the proteins in a cell that is determined by their sequence space. The protein sequences capable of folding to functional three-dimensional shapes of the proteins are clearly different for different organisms, since sequences obtained from human proteins often fail to form correct three-dimensional structures in bacterial organisms. In analogy to the question "What kind of things do people say?" we therefore need to ask the question "What kind of amino acid sequences occur in the proteins of an organism?" An understanding of the sequence space occupied by proteins in different organisms would have important applications for "translation" of proteins from the language of one organism into that of another and design of drugs that target sequences that might be unique or preferred by pathogenic organisms over those in human hosts.

Here we describe the development of a biological language modeling toolkit (BLMT) for genome-wide statistical amino acid n-gram analysis and comparison across organisms (freely accessible at www.cs.cmu.edu/~blmt). Its functions were applied to 44 different bacterial, archaeal and the human genome. Amino acid n-gram distribution was found to be characteristic of organisms, as evidenced by (1) the ability of simple Markovian unigram models to distinguish organisms, (2) the marked variation in n-gram distributions across organisms above random variation, and (3) identification of organism-specific phrases in protein sequences that are greater than an order of magnitude standard deviations away from the mean. These lines of evidence suggest that different organisms utilize different "vocabularies" and "phrases", an observation that may provide novel approaches to drug development by specifically targeting these phrases. The results suggest that further detailed analysis of n-gram statistics of protein sequences from whole genomes will likely - in analogy to word n-gram analysis - result in powerful models for prediction, topic classification and information extraction of biological sequences.


March 22, 2002 --Talk 2: Fei Huang, LTI Ph.D. student
An Adaptive Approach to Named Entity Extraction for Meeting Applications


Named entity extraction has been intensively investigated in the past several years. Both statistical approaches and rule-based approaches have achieved satisfactory performance for regular written/spoken language. However when applied to highly informal or ungrammatical languages, e.g., meeting languages, because of the many mismatches in language genre, the performance of existing methods decreases significantly.

In this paper we propose an adaptive method of named entity extraction for meeting understanding. This method combines a statistical model trained from broadcast news data with a cache model built online for ambiguous words, computes their global context name class probability from local context name class probabilities, and integrates name lists information from meeting profiles. Such a fusion of supervised and unsupervised learning has shown improved performance of named entity extraction for meeting applications. When evaluated using manual meeting transcripts, the proposed method demonstrates a 26.07% improvement over the baseline model. Its performance is also comparable to that of the statistical model trained from a small annotated meeting corpus.


March 8, 2002 -- Ralf Brown, LTI Systems Scientist
TMI-2002 Preview: Two Topics

Ralf Brown will present draft versions of two talks that he will be giving at TMI-2002 in Japan.

Corpus-Driven Splitting of Compound Words by Ralf Brown
I present a method for splitting compound words into their constituents based on cognate words in the other language of a parallel corpus. A minor extension to the method allows the decompounding of words which do not have cognates in the other language. By decompounding the training corpus for an Example-Based MT system, the incidence of word alignment failure can be substantially reduced, yielding a modest improvement in performance.

Challenges in Automated Elicitation of a Controlled Bilingual Corpus by Katharina Probst and Lori Levin (to be presented at TMI on their behalf)
Learning translation rules from carefully elicited sentences is an uncommon but important approach to automated learning for machine translation. The approach is uncommon with good reason -- elicitation can easily go awry if not carefully monitored. This paper addresses eight challenges of automated elicitation and discusses their solution in the AVENUE machine translation project. The elicited sentences in AVENUE are used to semi-automatically infer transfer rules for the desired language pair.



March 1, 2002 -- Teruko Mitamura, LTI Senior Research Scientist
Pronominal Anaphora Resolution in the KANTOO Multilingual Machine Translation System

We present an approach to pronominal anaphora resolution using KANT Controlled Language and the KANTOO multilingual MT system. Our algorithm is based on a robust, syntax-based approach that applies a set of restrictions and preferences to select the correct antecedent. We report a success rate of 93.3% on a training corpus with 286 anaphors, and 88.8% on held-out data with 144 anaphors. Our approach translates anaphors to Spanish with 97.9% accuracy and to German with 94.4% accuracy on held-out data.

The KANT System is a knowledge-based, interlingual machine translation system, developed for multilingual translations of technical documents in various domains. Application domains include heavy equipment documentation, computer manuals, automotive documentation, and medical records written in controlled language. KANTOO is the reimplementation of the original KANT MT system, and also accepts Controlled English as input. The current input specification is referred to as KANT Controlled English (KCE). Although some of the sentences in this study were rewritten to conform to KCE, we did not edit pronominal anaphors or any other constituents relevant to the anaphor resolution process.

This work is the result of collaboration with Eric Nyberg, Enrique Torrejon, Dave Svoboda, Annelen Brunner and Kathryn Baker.

Feb 8, 2002 -- Keiichi Tokuda, LTI Visting Researcher
HMM-Based Speech Synthesis - Toward Human-like Talking Machines

The increasing availability of large speech databases makes it possible to construct speech synthesis systems, which are referred to as data-driven, corpus-based, speaker-driven, or trainable approach, by applying statistical learning algorithms. These systems, which can be automatically trained, not only generate natural and high quality synthetic speech but also can reproduce voice characteristics of the original speaker. This talk presents one of these approaches: hidden Markov model (HMM) based speech synthesis in which synthetic speech is generated directly from HMMs. Algorithms for speech parameter generation from HMMs, and a mel-cepstrum based vocoding technique are reviewed, and an approach to simultaneous modeling of phonetic and prosodic parameters (spectrum, F0, and duration) is also presented. The main feature of the system is the use of dynamic feature: by inclusion of dynamic coefficients in the feature vector, the speech parameter sequence generated in synthesis is constrained to be realistic, as defined by the parameters of the HMMs. The attraction of this approach is in that voice characteristics of synthesized speech can easily be changed by transforming HMM parameters. Actually, it is shown that we can change voice characteristics of synthetic speech by applying a speaker adaptation technique which has been used in speech recognition systems. The relation between the HMM-based approach and other concatenative speech synthesis approaches is also discussed.

BIOGRAPHY: Keiichi Tokuda received the B.E. degree in electrical and electronic engineering from the Nagoya Institute of Technology, Nagoya, Japan in 1984, and the M.E. and Dr.Eng. degrees in information processing from the Tokyo Institute of Technology, Tokyo, Japan, in 1986 and 1989, respectively. From 1989 to 1996 he was a Research Associate at the Department of Electronic and Electric Engineering, Tokyo Institute of Technology. Since 1996 he has been with the Department of Computer Science, Nagoya Institute of Technology as Associate Professor. He was an Invited Researcher at ATR-SLT, and currently is a Visiting Researcher at CMU-LTI. He is a co-recipient of both the Excellent Paper Award and the Inose Award from the IEICE in 2001, and the TELECOM System Technology Prize from the Telecommunications Advancement Foundation Award, Japan, in 2001. He is a member of the Speech Technical Committee of the IEEE Signal Processing Society. His research interests include speech coding, speech synthesis and recognition, and multimodal interface.


Feb 1, 2002 -- Adam Berger, Eizel Technologies Inc.
Algorithmic Content Repurposing for Small-screen Devices

This talk will describe some of the past, present, and future work at Eizel Technologies Inc., a Pittsburgh-based software firm devoted to building enabling technologies for mobile devices. The fundamental problem which Eizel faces is how to adapt a document - a web page, email, image, video, etc. - which was originally designed for display on a large screen so that it may be conveniently viewed on a small screen, such as one finds on a PDA or mobile phone. The problem is important because small-screen Internet-connected devices are much more prevalent worldwide than traditional PCs. The problem is interesting because solving it requires techniques from human-computer interaction, artificial intelligence, image processing, and, of course, natural language processing.

BIOGRAPHY: Adam Berger is a 2001 PhD graduate of the School of Computer Science, where he worked at the intersection of machine learning and statistical language processing under the direction of John Lafferty. Previously he worked in the statistical machine translation group at IBM's Thomas J. Watson Research Center.



Webmaster: ehn@cs.cmu.edu



LTI is part of the School of Computer Science at Carnegie Mellon University.
This page is maintained by ckoch+@cs.cmu.edu, and was last updated 06 Feb 2002.