
About
Education
Research
People
News/Events
Contacts
|
LTI Seminar Abstracts Spring 2002
May 9, 2002 --Tanja Schultz, LTI Research Associate
Multilinguality in Automatic Speech Recognition Systems
In this talk I will give an overview of my current research and future plans
in the investigation of multilinguality issues in automatic speech recognition.
In the first part I will describe work which aims at dramatically reducing
the amount of effort required to develop a speech-to-text LVCSR engine in
a new language. The second part adresses applications of multilingual engines
like the identification of non-verbal cues in spoken speech. I will discuss
general questions and challenges concerning the development of speech recognizers
in many different languages and give examples of some language peculiarities
which will explain why speech recognition in many languages requires much
more than just retraining with new data.
May 8, 2002 --Alexander Koller, Saarland University
Computational Linguistics and Theorem Proving in a Computer Game
Text adventures are a classical form of computer games which were most
popular in the eighties, but have gone somewhat out of fashion since.
The player interacts with the game world (e.g. the rooms and objects in
a space station) by typing natural-language commands; the computer provides
feedback in the form of natural-language descriptions of the world and
of the results of the player's actions.
We have developed a text adventure engine based on current technology
from computational linguistics and theorem proving. There is a "real"
parser and generation system, and a simple algorithm for resolving referring
expressions. The game world and the player's knowledge are modelled as
description logic knowledge bases and accessed through a DL inference
system. These inference services are used in almost every language-processing
module.
Our system resolves some of the most annoying limitations of traditional
text adventures. It is reasonably modular, and components can be replaced
by improved versions. Among other things, this makes it interesting as
a testbed for NLP modules and in education.
Alexander Koller is a PhD student at the Special Research Action 378 "Resource-Adaptive
Cognitive Systems" Saarland University. His research focuses on inference-based
methods for natural language understanding and generation.
May 3, 2002 --Bob Carpenter, SpeechWorks International,
Inc.
A Portable, Server-Side Dialog Framework for VoiceXML
We describe a spoken dialog application framework that combines the power
and flexibility of server-side Java Servlets and Java Server Pages (JSPs)
with the deployment portability, reliability and scalability of standard
web (HTTP) servers and VoiceXML clients. Applications are developed by extending
a framework of Java classes in order to define dialogs through lower level
actions such as speech recognition, audio prompting, speech synthesis, and
backend data access. The framework delegates session data management to
servlets, embedding frame-based representations for the application's global
and session data. Dialog flow is controlled through general constructions
such as loops, conditionals, scoped sub-dialogs, along with scoped command,
error, and exception handling. Prompting and grammars are configured through
simple JSP templates that generate the VoiceXML instructions for the server
to return to the client. The framework is designed to be extensible, as
demonstrated by the implementation of customizable backup and repeat commands
integrated with session data, command handling and grammar scoping. We will
further describe a high-level XML-based representation and its compilation
to a portable web archive format.
BIO: I work mainly on models of parsing, classification and interpretation
for written and spoken language. Recently I've been working on multi-modal
interfaces, automatic call routing, VoiceXML dialog frameworks. My current
projects involve automatic e-mail parsing and language identification for
synthesis. My current theoretical obsession is Kolmogorov Complexity.
I received a Ph.D. from the University of Edinburgh's Centre for Cognitive
Science, spent eight years as faculty at CMU in Computational Linguistics,
four years in the Bell Labs Multimedia lab, and the last two years at SpeechWorks
in the Dialog Product Group.
I race vintage motorcycles (pre 1930) cross-country on summer weekends.
During the week, I can be found playing the cello downtown in Roberto's
and my post-modern rock ensemble, "Port 8080" (he plays analog synth). Look
for our first album-length release, "Dynamic Programming", on Thrill Jockey
records this summer, featuring the club hit, "Maximum Entropy".
April 26, 2002 --Gareth J. F. Jones, Department of
Computer Science, University of Exeter, U.K.
An Investigation of Mixed-Media Information Retrieval
Growth in the availablity of electronic documents derived from sources other
than typed text has led to significant interest in areas such as Spoken
Document Retrieval and scanned Document Image Retrieval. These existing
studies have investigated topics such as issues of inaccurate document recognition
and indexing, and methods to compensate for these in retrieval. These studies
have all been conducted on the basis that documents within a collection
are all of one media type. Thus a user might be required to search separately
within a collection for each media to find information of interest. This
presentation will describe an investigation into Mixed-Media Information
Retrieval where the document collection is composed of a mixture of electronic
text, automatically transcribed spoken documents and document images indexed
using OCR. Experimental results compare retrieval from separate parallel
collections from the 3 media sources, and report retrieval results from
mixed-media collections where each document is present from only one of
the media sources. Results for this latter investigation concentrate on
analysing the retrieval behaviour of the documents from each source within
the mixed-media collection.
March 22, 2002 --Talk 1: Judith Klein-Seetharaman
Comparative n-gram analysis of whole-genome protein sequences
M. Ganapathiraju, D. Weisser, R. Rosenfeld, J. Carbonell, R. Reddy & J.
Klein-Seetharaman
A current barrier for successful rational drug design is the lack of understanding
of the structure space provided by the proteins in a cell that is determined
by their sequence space. The protein sequences capable of folding to functional
three-dimensional shapes of the proteins are clearly different for different
organisms, since sequences obtained from human proteins often fail to form
correct three-dimensional structures in bacterial organisms. In analogy
to the question "What kind of things do people say?" we therefore need to
ask the question "What kind of amino acid sequences occur in the proteins
of an organism?" An understanding of the sequence space occupied by proteins
in different organisms would have important applications for "translation"
of proteins from the language of one organism into that of another and design
of drugs that target sequences that might be unique or preferred by pathogenic
organisms over those in human hosts.
Here we describe the development of a biological language modeling toolkit
(BLMT) for genome-wide statistical amino acid n-gram analysis and comparison
across organisms (freely accessible at www.cs.cmu.edu/~blmt). Its functions
were applied to 44 different bacterial, archaeal and the human genome. Amino
acid n-gram distribution was found to be characteristic of organisms, as
evidenced by (1) the ability of simple Markovian unigram models to distinguish
organisms, (2) the marked variation in n-gram distributions across organisms
above random variation, and (3) identification of organism-specific phrases
in protein sequences that are greater than an order of magnitude standard
deviations away from the mean. These lines of evidence suggest that different
organisms utilize different "vocabularies" and "phrases", an observation
that may provide novel approaches to drug development by specifically targeting
these phrases. The results suggest that further detailed analysis of n-gram
statistics of protein sequences from whole genomes will likely - in analogy
to word n-gram analysis - result in powerful models for prediction, topic
classification and information extraction of biological sequences.
March 22, 2002 --Talk 2: Fei Huang, LTI Ph.D. student
An Adaptive Approach to Named Entity Extraction for Meeting Applications
Named entity extraction has been intensively investigated in the past several
years. Both statistical approaches and rule-based approaches have achieved
satisfactory performance for regular written/spoken language. However when
applied to highly informal or ungrammatical languages, e.g., meeting languages,
because of the many mismatches in language genre, the performance of existing
methods decreases significantly.
In this paper we propose an adaptive method of named entity extraction for
meeting understanding. This method combines a statistical model trained
from broadcast news data with a cache model built online for ambiguous words,
computes their global context name class probability from local context
name class probabilities, and integrates name lists information from meeting
profiles. Such a fusion of supervised and unsupervised learning has shown
improved performance of named entity extraction for meeting applications.
When evaluated using manual meeting transcripts, the proposed method demonstrates
a 26.07% improvement over the baseline model. Its performance is also comparable
to that of the statistical model trained from a small annotated meeting
corpus.
March 8, 2002 -- Ralf Brown, LTI Systems Scientist
TMI-2002 Preview: Two Topics
Ralf Brown will present draft versions of two talks that he will be giving
at TMI-2002 in Japan.
Corpus-Driven Splitting of Compound Words by Ralf Brown
I present a method for splitting compound words into their constituents
based on cognate words in the other language of a parallel corpus. A minor
extension to the method allows the decompounding of words which do not
have cognates in the other language. By decompounding the training corpus
for an Example-Based MT system, the incidence of word alignment failure
can be substantially reduced, yielding a modest improvement in performance.
Challenges in Automated Elicitation of a Controlled Bilingual Corpus
by Katharina Probst and Lori Levin (to be presented at TMI on their behalf)
Learning translation rules from carefully elicited sentences
is an uncommon but important approach to automated learning for machine
translation. The approach is uncommon with good reason -- elicitation
can easily go awry if not carefully monitored. This paper addresses eight
challenges of automated elicitation and discusses their solution in the
AVENUE machine translation project. The elicited sentences in AVENUE are
used to semi-automatically infer transfer rules for the desired language
pair.
March 1, 2002 -- Teruko Mitamura, LTI Senior Research
Scientist
Pronominal Anaphora Resolution in the KANTOO Multilingual Machine
Translation System
We present an approach to pronominal anaphora resolution using KANT Controlled
Language and the KANTOO multilingual MT system. Our algorithm is based on
a robust, syntax-based approach that applies a set of restrictions and preferences
to select the correct antecedent. We report a success rate of 93.3% on a
training corpus with 286 anaphors, and 88.8% on held-out data with 144 anaphors.
Our approach translates anaphors to Spanish with 97.9% accuracy and to German
with 94.4% accuracy on held-out data.
The KANT System is a knowledge-based, interlingual machine translation system,
developed for multilingual translations of technical documents in various
domains. Application domains include heavy equipment documentation, computer
manuals, automotive documentation, and medical records written in controlled
language. KANTOO is the reimplementation of the original KANT MT system,
and also accepts Controlled English as input. The current input specification
is referred to as KANT Controlled English (KCE). Although some of the sentences
in this study were rewritten to conform to KCE, we did not edit pronominal
anaphors or any other constituents relevant to the anaphor resolution process.
This work is the result of collaboration with Eric Nyberg, Enrique Torrejon,
Dave Svoboda, Annelen Brunner and Kathryn Baker.
Feb 8, 2002 -- Keiichi Tokuda, LTI Visting Researcher
HMM-Based Speech Synthesis - Toward Human-like Talking Machines
The increasing availability of large speech databases makes it possible
to construct speech synthesis systems, which are referred to as data-driven,
corpus-based, speaker-driven, or trainable approach, by applying statistical
learning algorithms. These systems, which can be automatically trained,
not only generate natural and high quality synthetic speech but also can
reproduce voice characteristics of the original speaker. This talk presents
one of these approaches: hidden Markov model (HMM) based speech synthesis
in which synthetic speech is generated directly from HMMs. Algorithms
for speech parameter generation from HMMs, and a mel-cepstrum based vocoding
technique are reviewed, and an approach to simultaneous modeling of phonetic
and prosodic parameters (spectrum, F0, and duration) is also presented.
The main feature of the system is the use of dynamic feature: by inclusion
of dynamic coefficients in the feature vector, the speech parameter sequence
generated in synthesis is constrained to be realistic, as defined by the
parameters of the HMMs. The attraction of this approach is in that voice
characteristics of synthesized speech can easily be changed by transforming
HMM parameters. Actually, it is shown that we can change voice characteristics
of synthetic speech by applying a speaker adaptation technique which has
been used in speech recognition systems. The relation between the HMM-based
approach and other concatenative speech synthesis approaches is also discussed.
BIOGRAPHY: Keiichi Tokuda received the B.E. degree in electrical and
electronic engineering from the Nagoya Institute of Technology, Nagoya,
Japan in 1984, and the M.E. and Dr.Eng. degrees in information processing
from the Tokyo Institute of Technology, Tokyo, Japan, in 1986 and 1989,
respectively. From 1989 to 1996 he was a Research Associate at the Department
of Electronic and Electric Engineering, Tokyo Institute of Technology.
Since 1996 he has been with the Department of Computer Science, Nagoya
Institute of Technology as Associate Professor. He was an Invited Researcher
at ATR-SLT, and currently is a Visiting Researcher at CMU-LTI. He is a
co-recipient of both the Excellent Paper Award and the Inose Award from
the IEICE in 2001, and the TELECOM System Technology Prize from the Telecommunications
Advancement Foundation Award, Japan, in 2001. He is a member of the Speech
Technical Committee of the IEEE Signal Processing Society. His research
interests include speech coding, speech synthesis and recognition, and
multimodal interface.
Feb 1, 2002 -- Adam Berger, Eizel Technologies Inc.
Algorithmic Content Repurposing for Small-screen Devices
This talk will describe some of the past, present, and future work at
Eizel Technologies Inc., a Pittsburgh-based software firm devoted to building
enabling technologies for mobile devices. The fundamental problem which
Eizel faces is how to adapt a document - a web page, email, image, video,
etc. - which was originally designed for display on a large screen so
that it may be conveniently viewed on a small screen, such as one finds
on a PDA or mobile phone. The problem is important because small-screen
Internet-connected devices are much more prevalent worldwide than traditional
PCs. The problem is interesting because solving it requires techniques
from human-computer interaction, artificial intelligence, image processing,
and, of course, natural language processing.
BIOGRAPHY: Adam Berger is a 2001 PhD graduate of the School of Computer
Science, where he worked at the intersection of machine learning and statistical
language processing under the direction of John Lafferty. Previously he
worked in the statistical machine translation group at IBM's Thomas J.
Watson Research Center.
|