Lemur Search

Home

About
   How To Apply
   The LTI Brochure


Education

   Ph.D.
   Dual Ph.D. with Portugal
   M.S.

   Undergrad Minor

   Courses


LTI Forms

Seminars
   LTI Seminar Series
   Joint Speech Seminar (JSS)

   Machine Translation (MT)

   Student Research Symposium

   Information Retrieval Series

   Large Scale Lunch Seminar

   Intelligence Seminar


Visitor Information
   General
   Maps & Directions
   Hotel Links
   Parking Information


Research
   Projects

   Reports

    Dissertations


People

   Faculty

   Students

   Upcoming Graduates

   Staff

   Visitors   

   Who to See for What


Administrative Contacts


LTI Seminar Abstracts
Fall 06 - Spring 07






April 20
Nizar Habash
Center for Computational Learning Systems
Columbia University

Arabic Diacritization through Full Morphological Tagging

Arabic is written without certain orthographic symbols, called diacritics, which represent among other things short vowels. The restoration of diacritics to written Arabic is an important processing step for several natural language processing applications, including training language models for automatic speech recognition, text-to- speech generation, and so on. I present here a new diacritization system for written Arabic based on a new combination of known techniques: a lexical resource for morphological analysis, a multi-classifier tagger and a lexeme language model. This new diacritization system outperforms the best previously published results (Zitouni et al. 2006) by reducing the word error rate to 14.9% (17% relative reduction from Zitouni et al. 2006) and reducing the diacritic error rate to 4.8% (13% relative reduction from Zitouni et al. 2006). These results were produced on the same training and test data used by (Zitouni et al. 2006). I will also present a detailed error analysis classifying the type of errors resolved by each of the different modules used.

April 13
Paul Constantinides
Senior Technical Staff, salesforce.com

Search at salesforce.com: challenges in a hosted, multi-tenant environment

Each day hundreds of thousands of people around the world use salesforce.com as part of their business processes. Our search functionality is their navigation tool to the information stored in our multi-tenant system. These users demand highly relevant and secure results with rapid indexing and querying, in more than a dozen supported languages. The performance and quality of the results we provide is tightly coupled with our system architecture. Our incremental and bulk indexers index millions of records per day: the majority within one minute. Our distributed querying complex handles millions of queries per day: the majority within one second.

I will discuss:

  • Our distributed search architecture
  • The features of our search system, which is based on a modified version of the Lucene search engine
  • The tools we use to measure system performance and accuracy
  • The challenges we have overcome to reach a high level of quality of service
  • Our goals and direction for the future

Bio: Paul Constantinides is the Software Development Manager for search technologies and Senior Member of Technical Staff at salesforce.com, where he has been working since 2004. At salesforce.com he has implemented both architectural and user facing search features, and added new functionality built on top of search. His primary responsibility is the direction of the search effort, but is occasionally still able to find time to program. Prior to salesforce.com, he implemented platforms for building, running, and tuning speech recognition applications at Nuance Communications. Before Nuance, he built voice dialog systems in the Sphinx Speech Group at Carnegie Mellon. He holds M.S. and B.S. degrees from the Electrical and Computer Engineering Department at Carnegie Mellon.

 

March 30
Kemal Oflazer
Visiting Professor, Carnegie Mellon University

Experiments with Different Representational Units in English-to-Turkish SMT

In this talk, we present some results of our on-going work on English to Turkish statistical machine translation. Turkish is an agglutinative language with very rich inflectional and derivational morphology that result in compex word structures. Turkish is also free constituent order with almost no formal ordering constraints at the sentence level. These and the fact that Turkish -- English parallel corpora is a scarce resource compared to other languages popular in SMT research bring about interesting issues for SMT involnving Turkish. After a discussion of the highlights of relevant aspects of Turkish, we investigate different representational granularities for sub-lexical representation. We find that (i) representing both Turkish and English at the morpheme-level but with some selective morpheme-grouping on the Turkish side of in the training data, (ii) augmenting the training data with ``sentences'' comprising only the content words of the original training data, and (iii) re-ranking the n-best decoder outputs (based on a morpheme/morpheme-group language model) with a word-level language model by combining translation model scores with word-level language model scores, provide a non-trivial improvement over a fully word-based baseline model. We also experiment with an iterative model training which may loosely be called "statistical post-editing" in which we use a decoded target training data along with the reference training data to build a second translation model which we expect to have a better performance than the original model. Despite our relatively limited training data, we improve from 20.16 BLEU for the baseline, to 24.84 BLEU, for all sentences. For English source sentences of length 1-20 the BLEU score is about 30.

Time allowing, we will also touch upon briefly on translating from Turkish to English and on the suitability of BLEU for languages like Turkish.

Brief Biography:

Kemal Oflazer is a Professor of Computer Science at Sabanci University in Istanbul, Turkey, currently on a sabbatical visit at LTI. He received his PhD in Computer Science from CMU in 1987. He has been working in the area language processing, especially for Turkish. His current research interests are in finite state methods for language processing, dependency parsing and statistical machine translation. He was the program co-chair for the 43rd ACL Meeting in 2005, and served on the editorial boards of Computational Linguistics and Journal of AI Research (JAIR) and currently serves on the editorial boards of Machine Translation and Journal of Research on Language and Computation. More information is available on http://people.sabanciuniv.edu/~oflazer/

March 9
Christos Faloutsos
Carnegie Mellon University

Graph Mining: Patterns and Tools for Static and Time-Evolving Graphs

Given a graph that evolves over time, what can we say about it? How does its diameter change? What is normal and what is abnormal? We present recent tools for discovering anomalies and patterns in both static, as well as time-evolving graphs. We also describe a realistic graph generator using 'Kronecker' matrix multiplication, a parameter-free graph partitioning algorithm ('cross-associations'), the 'CePS' algorithm to find important connections between two or more nodes, and a result on virus propagation in graphs.

BIOGRAPHICAL NOTE
Christos Faloutsos is a Professor at Carnegie Mellon University. He has received the Presidential Young Investigator Award by the National Science Foundation (1989), the Research Contributions Award in ICDM 2006, nine "best paper'' awards, and several teaching awards. He has served as a member of the executive committee of SIGKDD; he has published over 160 refereed articles, 11 book chapters and one monograph. He holds five patents and he has given over 20 tutorials and over 10 invited distinguished lectures. His research interests include data mining for streams and networks, fractals, indexing for multimedia and bio-informatics data, and database performance.

March 2
David A. Smith
Johns Hopkins University

Bootstrapping Monolingual Parsers from Multilingual Data

The creation of the Penn Treebank and similar datasets ca. 1990 produced a flowering of research on empirically trained parsers, which is now bearing fruit in information extraction and machine translation (e.g. Weischedel 2004, Quirk et al. 2005, Marcu et al. 2006). This revolution has bypassed most languages and domains, however, due to the expense of creating treebanks. Semisupervised learning methods such as bootstrapping and cotraining have the potential to leverage diverse sources of knowledge for robust statistical parsing in these new settings.

We argue that bootstrapping a parser from limited supervised data (~50-100 trees) is most promising when the model uses a rich set of redundant features, as in recent supervised models for scoring dependency parses (McDonald, Crammer, and Pereira, 2005). We show how to aid bootstrapping efficiently by drawing new features from a parser in another domain or even another language, via parallel corpora or dictionaries. These quasi-synchronous grammars extend prior bootstrapping work with synchronous grammars (Hwa et al. 2002, Smith & Smith 2004, Jansche 2005) and also have applications in translation modeling (Smith & Eisner 2006).

Drawing on Abney's (2004) analysis of the Yarowsky algorithm, we present a family of new on-line bootstrapping algorithms that optimize a likelihood-like loss function with with generalized entropy regularization. We show that this approach avoids the losses in accuracy incurred by EM-based learning. Combining diverse knowledge sources in a conditional model of graph spanning trees, we learn improved parsers of Czech, German, and other non-projective languages. We argue that these techniques are broadly applicable to bootstrapping in other NLP domains with a wealth of overlapping features.

Bio:
David Smith received his A.B. in classics from Harvard University. An NSF graduate fellow, he is currently a Ph.D. student in computer science at Johns Hopkins University's Center for Language and Speech Processing. His interests are in machine translation, natural language parsing, and semi-supervised machine learning methods. David was formerly head programmer for the Perseus Digital Library Project at Tufts University, where he strayed from the path of classical philology toward automatic morphological processing, geocoding, and information extraction.

February 2
Mahadev Satyanarayanan
Carnegie Group Professor of Computer Science
Carnegie Mellon University

Finding Needles in a Haystack with Diamond

How does an expert discover something relevant to a task in a large distributed repository of complex and loosely-structured data? For example, how does a military intelligence analyst identify suspicious events from recent satellite images and surveillance videos? Or, how does a pharmaceutical researcher identify adverse effects of a drug in a large collection of automated cell microscopy images? The terms "suspicious" and "adverse effects" refer to vague concepts. More precise definitions can only be given after examining the data in some depth. In other words, hypothesis-formation and hypothesis-validation proceed hand-in-hand in a tightly-coupled and iterative sequence. We refer to this inherently human-centric activity as "interactive data exploration."

Diamond is an open-source software platform for interactive search of complex data that has been jointly developed by Intel Research and Carnegie Mellon. It implements the concept of "early discard." This makes brute-force interactive search practical by eliminating irrelevant data as cheaply as possible. Further, Diamond embodies the concept of "self-tuning." This allows it dynamically adapt to different hardware configurations, workloads, and data content in a manner that is completely transparent to users and applications.

Medical and pharmaceutical researchers at University of Pittsburgh Medical Center, University of Pittsburgh School of Medicine, and Merck are collaborating with Diamond researchers to apply Diamond to their domain-specific tasks. This may open the door to research and diagnostic strategies that were not considered feasible until now.

January 26
Jaime Carbonell
Allen Newell Professor of Computer Science
Director, Language Technologies Institute

Context-Based Machine Translation

In 2001, Eli Abir, an off-the-grid inventor without formal schooling, came up with a novel way of combining long n-grams, which he called "linguistic DNA." This idea formed the basis of a new Machine Translation Paradigm now called Context-Based MT, or CBMT, after many years of hard work including infusion of relevant MT and IR component technologies. The presentation will focus on how CBMT works, including how CBMT achieved the highest BLEU scores reported to date on unseen newswire text, ableit so far only on Spanish-English.

The data ingredients required by CBMT are a comprehensive bilingual dictionary, a very large target-language-only corpus and an optional smaller source-langauge corpus -- and nothing else, no grammar rules and no biligual corpus. These resorces are used by two key processes, one called the "flooder", corresponding to a translation model, and the other called the "n-gram overlap resolver", corresponding to the decoder. A third component finds phrasal near-synonyms via an unsupervised learning process to overcome impasses when long n-grams fail to resolve.

December 15
Candace L. (Candy) Sidner
Senior Research Scientist
Mitsubishi Electric Research Labs Cambridge, Massachusetts

Collaborative Interface Agents: On the screen and on the robot

This talk will present work on building interface agents that collaborate with users in their activities. In particular, I will discuss two different kinds of interfaces and illustrate collaboration with each. The first is the DiamondHelp system, a general tool which provides guidance to users of consumer electronic products. DiamondHelp was designed to use a GUI interface and optional speech capabilities to support collaborative conversation with the DiamondHelp agent while at the same time allowing the user to interact with the product itself. I will demo DiamondHelp guiding a user in programming a high-end washer/dryer, and discuss the architecture of the system and how it might be used in applications beyond consumer electronics.

The talk will then discuss work on collaboration with a humanoid robot. Our goal has been to understand how collaborative interactions take advantage of the engagement, or perceived connection, between the participants and apply that to human interactions with a robot. In this talk I will focus on recent efforts in interpreting human conversational nodding and in locating people in an open environment. I will briefly discuss how aspects of this technology can be applied to non-robotic interfaces.

Bio:

Candace L. (Candy) Sidner is an expert in user interfaces, especially those using speech, natural language understanding, and collaboration. Candy is senior research scientist at Mitsubishi Electric Research Labs in Cambridge, Massachusetts. She is currently working on human-robot interaction, focused on the role of engagement in those interactions, and on interface applications involving collaborative interface agents in the COLLAGEN project. She is a Fellow and past Councilor of the American Association for Artificial Intelligence, a senior member of the IEEE, and a member of the scientific advisory board for the EU Cognitive Systems for Cognitive Assistants (CoSy) project. She is currently general chair for HLT-NAACL 2007. She has served as program cochair of Intelligent User Interfaces 2006, SIGIAL 2004, chair of the International Conference on Intelligent User Interfaces in 2001, and President of the Association for Computational Linguistics. She received her Ph.D. from MIT in Computer Science.

URL: www.merl.com/people/sidner
November 17
Brian MacWhinney
CMU Psychology Department

Computational Linguistics and Language Learning

The study of language learning is a major research focus in linguistics, psycholinguistics, neuroscience, philosophy, psychology, education, sociology, and applied linguistics. Language learning data play a central role in important debates regarding language innateness, modularity of mind, brain plasticity, and processes of language change. The resolution of these debates often hinges on having access to large quantities of transcribed interactions of parents (or teachers) and children (or second language learners). To address this need, we have constructed the CHILDES and TalkBank databases which now contain over 14 million utterances for English and lesser amounts for a collection of 26 additional languages. Many of these transcripts are directly linked to audio and video.

To process this database, we are relying increasingly on tools from computational linguistics, including taggers, parsers, annotators, schema, and web-delivery mechanisms. In this talk, we will review recent work in three areas:

1. The logical problem of language acquisition. Here, we will see how searches of the CHILDES database point to a resolution of the logical problem of language acquisition that is very different from that proposed in Chomsky’s minimalist theory. Instead, we will see how item-based patterns can be configured to acquire complex structures on the basis of positive evidence.

2. The extraction of grammatical relations. Here we have constructed taggers for eight languages that are tuned to the specific requirements of child-parent discourse. The output of these taggers can now be processed through a deterministic grammatical relations parser to yield tagged dependency structures for one important English training corpus at accuracy rates rising above 95%. These results can then be used to compute developmental indices of proven clinical and diagnostic importance.

3. Interactional analysis. By linking transcripts to video over the web, we have been able to provide support for a wide range of microgenetic conversation-analytic studies of learning in classroom contexts.

Finally, in the context of work supported by the PSLC (Pittsburgh Science of Learning Center), we have used basic computational linguistic tools and Bayesian principles to predict language learning difficulties in French (gender, dictation) and Chinese (pinyin, vocabulary, characters). These methods rely on the principle of graduated interval recall for maximization of learning efficiency

November 10
Anatole Gershman
Accenture Technology Labs

The Paradox in Services R&D: Moving Innovation from the Field into the Labs

While services constitute about 80% of the U.S. economy, the leading companies in this sector spend very little on formal R&D. This is true not only for companies such as Wal-Mart or Citibank but also for the largest IT consulting companies. Yet, the services sector is very innovative - the rise of eCommerce being but the most visible recent example. Innovation in the services sector has been happening in the field as an entrepreneurial activity rather than as a deliberate R&D process either in industry or academia. Historically, institutional R&D and academic programs emerged only when a sector reached an advanced state of industrialization: for services that would include repeatability, measurability and mass customization. We believe that services are entering the age of industrialization and require dedicated industrial-strength R&D that goes beyond entrepreneurial innovation.

What should be the scope of services R&D and what are the main challenges? Clearly, we need to explore what kinds of services will be possible in the future and how they could be implemented in a scalable fashion. I argue that services are evolving along 5 dimensions:
1. Granularity – the object of services will be individuals and individual objects as opposed to groups and categories today.
2. Ubiquity – services will be delivered everywhere they are needed and not at fixed service locations.
3. Timing – services will be delivered increasingly in real time.
4. Contextualization – services will be increasingly aware of the specific context of each user, including his or her location, intentions and surrounding objects.
5. Intelligence – the ability to deliver the right stuff at the right time and to adapt to changes.

Progress along these dimensions presents formidable challenges to applied computer science. These challenges can be grouped into three categories. The first is the challenge of “sensing.” Services will need to identify and track people, objects and events relying on often sketchy information from physical sensors or on textual or voice descriptions. The second challenge is the challenge of “thinking.” Services will need to recognize patterns of behavior and optimize their responses. This will require extensive modeling of people and activities which will have to be constructed, maintained and adapted automatically with minimum human intervention. The third challenge is the challenge of “acting.” Rapidly evolving technology creates many means for interaction with customers - from wearable or even implanted devices to large public displays and “smart” objects. Services will need to recognize and use the best means for achieving the task at hand. Whereas sensing-thinking-acting is the typical cycle of an intelligent agent, we mean the above at a full distributed systems level.

Accenture was one of the first IT services companies to recognize the need for industrial-strength R&D and started its first dedicated lab in 1989. We systematically scanned the emerging technologies and evaluated their potential for disruptive changes in services. In this presentation, I will discuss our approach to building services that “sense, think and act” in the world and will give many examples of the prototypes we have built.

Exciting as it is working in an applied lab, there are many critical problems that cannot be properly addressed there - problems that can only be pursued at academic labs because they require more fundamental long-range research. I will discuss the four areas which I think are particularly important for achieving the “sense, think and act” vision:
• fusion of information from disparate sources without which “sensing” is of limited use,
• modeling of complex activities which is the basis of “thinking,”
• knowledge and language acquisition through active learning, essential for scaling up, and finally
• creation of smart objects that enable intelligent “acting”

December 3
Mohit Kumar,
LTI PhD student

Learning from the Report-writing Behavior of Individuals

In this talk, I describe a briefing system that learns to predict the contents of reports generated by users who create periodic (weekly) reports as part of their normal activity. The system observes content-selection choices that users make and builds a predictive model that could, for example, be used to generate an initial draft report. Using a feature of the interface the system also collects information about potential user-specific features. The system was evaluated under realistic conditions, by collecting data in a project-based university course where student group leaders were tasked with preparing weekly reports for the benefit of the instructors, using the material from individual student reports.

We address the question of whether data derived from the implicit supervision provided by end-users is robust enough to support not only model parameter tuning but also a form of feature discovery. Results indicate that this is the case: system performance improves based on the feedback from user activity. We find that individual learned models (and features) are user-specific, although not completely idiosyncratic. This may suggest that approaches which seek to optimize models globally (say over a large corpus of data) may not in fact produce results acceptable to all individuals."

Joint work with Prof Alex Rudnicky & Nikesh Garera.

Simon Fung,
LTI PhD student

Designing an Elicitation Corpus with Semantic Representations

This talk will describe the design and creation of the LDC Elicitation Corpus, part of the AVENUE project. An elicitation corpus is a set of sentences that illustrate various semantic categories (e.g. number, gender) and constructions (e.g. relative clauses), to be translated into a language being studied. Before creating the source sentences, in this project we first create a semantic representation for each sentence, which specifies the values in various semantic categories (e.g. singular or plural for number) that the sentence should contain. This way, we can fix semantic details more precisely than is possible with only the sentence in the source language.

October 27
Rong Jin
Michigan State University

Generalized Maximum Margin Clustering and Unsupervised Kernel Learning

Maximum margin clustering extends the theory of support vector machine to unsupervised learning, and has shown promising performance in recent studies. However, it has three major problems that question its application of real-world applications: (1) it is computationally expensive and difficult to scale to large-scale datasets; (2) it requires data preprocessing to ensure the clustering boundary to pass through the origins, which makes it unsuitable for clustering unbalanced dataset; and (3) its performance is sensitive to the choice of kernel functions. In this paper, we propose the "Generalized Maximum Margin Clustering" framework that addresses the above three problems simultaneously. The new framework generalizes the maximum margin clustering algorithm in that (1) it allows any clustering boundaries including those not passing through the origins; (2) it significantly improves the computational efficiency by reducing the number of parameters; and (3) it automatically determines the appropriate kernel matrix without any labeled data. Our empirical studies demonstrate the efficiency and the effectiveness of the generalized maximum margin clustering algorithm. Furthermore, in this talk, I will show the theoretical connection among the spectral clustering, the maximum margin clustering and the generalized maximum margin clustering.

Bio: Dr. Rong Jin is an assistant Prof. of the Computer and Science Engineering Dept. of Michigan State University since 2003. He is working in the areas of statistical machine learning and its application to information retrieval. In the past, Dr. Jin has worked on a variety of machine learning algorithms, and has extensive experience with the application of machine learning algorithms to information retrieval. Dr. Jin holds a B.A. in Engineering from Tianjin University, an M.S. in Physics from Beijing University, and an M.S. and Ph.D. in the area of language technologies from Carnegie Mellon University.

October 20
Stephan Vogel
LTI Research associate

Statistical Machine Translation at LTI: What was Done, What’s to Come

Statistical Machine Translation (SMT) had been proposed in the 90s and since been taken up by many research groups to make it a very active research field. One could even get the impression that SMT has become the most dominant paradigm in machine translation. Most SMT systems are so-called phrase-based translation system, using phrasal translations learned from large bilingual corpora as building blocks. In this talk I will give an overview of the SMT system which is under development at LTI in close collaboration with University of Karlsruhe.

Statistical machine translation is data-driven in that existing translations are used to learn the mapping, in terms of lexical transfer and word order, from source to target language. This requires word and phrase alignment algorithms. A quick overview of word alignment approaches will be given, followed by a presentation of phrase alignment as currently used in our SMT system.

As second important component in an SMT system is the language model. Recent development has gone towards using long history n-gram language models trained on very large corpora. We address this requirement by using a suffix array based language model which does not impose any restrictions to history length (whatever is in the corpus, we can use it), and which can be distributed over many computers to scale up to large corpora.

The next component of the SMT system is the decoder, which implements a search for the best translation given the models. Interesting aspects of the decoder are the overall search strategy, esp. how word reordering is realized, and the size of the search space is controlled by hypothesis recombination and pruning. Multiple components interact in the decoder by combining the different models log-linearly. This leads to the problem of finding optimal weights for the different models. N-best translation lists are used in this optimization process. A second benefit from generating n-best list comes from using additional features, which can not be used in full decoding, in n-best list rescoring.

The presentation will conclude with an overview of current research topics we are concentration on in our group: speech translation, i.e. translating lattices and using information from the speech recognizer inside the SMT decoder; building SMT systems which run on handheld devices; bringing morphology and syntax into SMT systems; learning from non-parallel corpora; developing reliability measures to annotate the translation output for downstream processing like summarization and question answer.

October 16
Hermann Helbig
University at Hagen, Germany

Multilayered Extended Semantic Networks as a Knowledge Representation Paradigm and Interlingua for Meaning Representation

Abstract:

The talk gives an overview of Multilayered Extended Semantic Networks (abbreviated MultiNet), which is one of the most comprehensively described knowledge representation paradigms used as a semantic interlingua in large-scale NLP applications and for linguistic investigations into the semantics and pragmatics of natural language.

As with other semantic networks, concepts are represented in MultiNet by nodes, and relations between concepts are represented as arcs between these nodes. Additionally to that, every node is classified according to a predefined conceptual ontology forming a hierarchy of sorts, and the nodes are embedded in a multidimensional space of layer attributes and their values. MultiNet provides a set of about 150 standardized relations and functions which are described in a very concise way including an axiomatic apparatus, where the axioms are classified according to predefined types. The representational means of MultiNet claim to fulfill the criteria of universality, homogeneity, and cognitive adequacy. In the talk, it is also shown, how MultiNet can be used for the semantic representation of different semantic phenomena.

To overcome the quantitative barrier in building large knowledge bases and semantically oriented computational lexica, MultiNet is associated with a set of tools including a semantic interpreter NatLink for automatically translating natural language expressions into MultiNet networks, a workbench LIA for the computer lexicographer, and a workbench MWR for the knowledge engineer for managing and graphically manipulating semantic networks. The applications of MultiNet as a semantic interlingua range from natural language interfaces to the Internet and to dedicated databases, over question-answering systems, to systems for automatic knowledge acquisition.

Biographical Note:

Hermann Helbig is head of the chair "Intelligent Information and Communication Systems" at the University of Hagen, Germany. He holds an M.S. (Diploma) in Physics, a Ph.D. in Computer Science (Dr.rer.nat.) in the field of AI, and has got his habilitation (Dr.rer.nat.habil.) in the field of Knowledge Representation. From 1970 through 1992 he held several positions in industrial research in AI and Computational Linguistics and was simultaneously lecturer for Artificial Intelligence at the University of Technology at Dresden. His main research interests lie in the field of semantically based natural language processing with applications in question-answering systems and natural language interfaces to the Internet. Former sabbatical stays led him to ICSI in Berkeley and to the Universities of Buffalo (USA), and Edinburgh, Sheffield and London (UK). He wrote several books in the field of AI, the last one: "Knowledge Representation and the Semantics of Natural Language" is one of the standard textbooks about semantic networks.

September 29
Dan Melamed
New York University

Scalable Discriminative Learning for Powerful Translation Models

Abstract:
The translational equivalence relations in a surprisingly high fraction of ordinary bitexts cannot be effectively explained by commonly used translation models. Models of translational equivalence with more expressive power are necessary. However, currently popular machine learning techniques do not scale up well for these more powerful models, which limits these models' practical utility. I shall present a new purely discriminative learning method for structured prediction problems, including parsing and translation. This method scales up to millions of features over large training sets, such as those used for statistical MT. Experiments have shown that even context-sensitive models can be effectively trained using this method. If there is interest, I will also give an overview of the GenPar toolkit, which made all of this work possible.
September 22
Tomoki Toda
Faculty, Shikano Laboratory
Nara Advanced Institute of Technology, Japan

Improving Body Transmitted Unvoiced Speech with Statistical Voice Conversion towards Silent-Speech Telephone

Abstract:
Cellular phones have enabled us to communicate with each other by speech whenever and wherever. However, it has caused a problem. Speech is recognized as NOISE by the other persons around a speaker in some situations such as a meeting. In order to address this problem, we aim to realize ``silent-speech telephone'' allowing speech communication annoying nobody in any situation. Non-Audible Murmur (NAM) enables us to talk while keeping silent. However, it is hard to directly use NAM as a medium for human communication because of its less intelligibility and unfamiliar sounds. In order to address this problem, we propose a conversion method from NAM to ordinary speech (NAM-to-Speech). In advance, we train GMMs for representing correlations between acoustic features of NAM and those of speech using around 50 utterance pairs of NAM and speech. Once we train those GMMs, we can convert any sample of NAM to that of speech with statistical feature conversion based on maximum likelihood estimation. Although NAM-to-Speech converts NAM to intelligible voices with similar quality to speech, there is still a large problem, i.e., difficulties of the F0 estimation from unvoiced speech. In order to avoid this problem, we propose another conversion method from NAM to whisper that is a familiar and intelligible unvoiced speech (NAM-to-Whisper). Moreover, we enhance NAM-to-Whisper so that multiple types of body-transmitted unvoiced speech such as NAM and Body Transmitted Whisper (BTW) are accepted as input voices. We evaluate the performance of the proposed conversion method. Experimental results demonstrate that 1) intelligibility and naturalness of NAM are significantly improved by NAM-to-Whisper, 2) NAM-to-Whisper outperforms NAM-to-Speech, and 3) we can train a single conversion model successfully converting both NAM and BTW to the target voice.

Bio: Tomoki Toda received the B.E. degree in electrical engineering from Nagoya University in 1999 and the M.E. and Ph.D. degrees in engineering from the Graduate School of Information Science, NAIST in 2001 and 2003, respectively. During 2001-2003, he was an intern researcher and a visiting researcher at ATR-SLT. He was a Research Fellow of the JSPS in Graduate School of Engineering, NITECH during 2003-2005. He was a visiting researcher at LTI, CMU from October 2003 to September 2004. He is currently an Assistant Professor of the Graduate School of Information Science, NAIST and a visiting researcher at ATR-SLC.

August 16, 2006
Tong Zhang
Yahoo! Research

Some Theoretical and Algorithmic Issues in Large Scale Ranking and Scoring

I will discuss prediction problems encountered in internet companies such as Yahoo, and then focus on the web-search problem. Mainly I will discuss some theoretical issues in this setting. I will then talk about some related algorithmic issues in the context of optimizing the scoring function of a statistical translation system, where the decoding procedure is treated as a black box. I will present some initial results as well as problems encountered.

Bio: Tong Zhang received a B.A. in mathematics and computer science from Cornell University in 1994 and a Ph.D. in computer Science from Stanford University in 1998. After being a research staff member of IBM T.J. Watson Research Center in Yorktown Heights, New York, he joined Yahoo in 2005. His research interests include machine learning, numerical algorithms, and their applications.

 
 



LTI is part of the School of Computer Science at Carnegie Mellon University.
This page is maintained by The LTI Webmaster.