April 20 Nizar Habash Center for Computational Learning Systems
Columbia University
Arabic Diacritization through Full Morphological Tagging
Arabic is written without certain orthographic symbols, called
diacritics, which represent among other things short vowels. The
restoration of diacritics to written Arabic is an important processing
step for several natural language processing applications, including
training language models for automatic speech recognition, text-to-
speech generation, and so on. I present here a new diacritization system
for written Arabic based on a new combination of known techniques: a
lexical resource for morphological analysis, a multi-classifier tagger
and a lexeme language model. This new diacritization system outperforms
the best previously published results (Zitouni et al. 2006) by reducing
the word error rate to 14.9% (17% relative reduction from Zitouni et al.
2006) and reducing the diacritic error rate to 4.8% (13% relative reduction
from Zitouni et al. 2006). These results were produced on the same
training and test data used by (Zitouni et al. 2006). I will also present
a detailed error analysis classifying the type of errors resolved by each of the
different modules used.
April 13 Paul Constantinides Senior Technical Staff, salesforce.com
Search at salesforce.com: challenges in a hosted, multi-tenant
environment
Each day hundreds of thousands of people around the world use
salesforce.com as part of their business processes. Our search
functionality is their navigation tool to the information stored in our
multi-tenant system. These users demand highly relevant and secure
results with rapid indexing and querying, in more than a dozen supported
languages. The performance and quality of the results we provide is
tightly coupled with our system architecture. Our incremental and bulk
indexers index millions of records per day: the majority within one
minute. Our distributed querying complex handles millions of queries per
day: the majority within one second.
I will discuss:
Our distributed search architecture
The features of our search system, which is based on a
modified version of the Lucene search engine
The tools we use to measure system performance and accuracy
The challenges we have overcome to reach a high level of
quality of service
Our goals and direction for the future
Bio: Paul Constantinides is the Software Development Manager for search
technologies and Senior Member of Technical Staff at salesforce.com,
where he has been working since 2004. At salesforce.com he has
implemented both architectural and user facing search features, and
added new functionality built on top of search. His primary
responsibility is the direction of the search effort, but is
occasionally still able to find time to program. Prior to
salesforce.com, he implemented platforms for building, running, and
tuning speech recognition applications at Nuance Communications. Before
Nuance, he built voice dialog systems in the Sphinx Speech Group at
Carnegie Mellon. He holds M.S. and B.S. degrees from the Electrical and
Computer Engineering Department at Carnegie Mellon.
March 30 Kemal Oflazer Visiting Professor, Carnegie Mellon University
Experiments with Different Representational Units in
English-to-Turkish SMT
In this talk, we present some results of our on-going work on English to
Turkish statistical machine translation. Turkish is an agglutinative
language with very rich inflectional and derivational morphology that result
in compex word structures. Turkish is also free constituent order with
almost no formal ordering constraints at the sentence level. These and the
fact that Turkish -- English parallel corpora is a scarce resource compared
to other languages popular in SMT research bring about interesting issues
for SMT involnving Turkish. After a discussion of the highlights of relevant
aspects of Turkish, we investigate different representational granularities
for sub-lexical representation. We find that (i) representing
both Turkish and English at the morpheme-level but with some selective
morpheme-grouping on the Turkish side of in the training data, (ii)
augmenting the training data with ``sentences'' comprising only the
content words of the original training data, and (iii) re-ranking
the n-best decoder outputs (based on a morpheme/morpheme-group
language model) with a word-level language model by combining
translation model scores with word-level language model scores,
provide a non-trivial improvement over a fully word-based baseline
model. We also experiment with an iterative model training which may loosely
be called "statistical post-editing" in which we use a decoded target
training data along with the reference training data to build a second
translation model which we expect to have a better performance than the
original model. Despite our relatively limited training data, we improve
from 20.16 BLEU for the baseline, to 24.84 BLEU, for all sentences. For
English source sentences of length 1-20 the BLEU score is about 30.
Time allowing, we will also touch upon briefly on translating from Turkish
to English and on the suitability of BLEU for languages like Turkish.
Brief Biography:
Kemal Oflazer is a Professor of Computer Science at Sabanci University in
Istanbul, Turkey, currently on a sabbatical visit at LTI. He received his
PhD in Computer Science from CMU in 1987. He has been working in the area
language processing, especially for Turkish. His current research interests
are in finite state methods for language processing, dependency parsing and
statistical machine translation. He was the program co-chair for the 43rd
ACL Meeting in 2005, and served on the editorial boards of Computational
Linguistics and Journal of AI Research (JAIR) and currently serves on the
editorial boards of Machine Translation and Journal of Research on Language
and Computation. More information is available on
http://people.sabanciuniv.edu/~oflazer/
March 9 Christos Faloutsos Carnegie Mellon University
Graph Mining: Patterns and Tools for Static and Time-Evolving Graphs
Given a graph that evolves over time, what can we say about it? How does its diameter change? What is normal and what is abnormal? We present recent tools for discovering anomalies and patterns in both static, as well as time-evolving graphs. We also describe a realistic graph generator using 'Kronecker' matrix multiplication, a parameter-free graph partitioning algorithm ('cross-associations'), the 'CePS' algorithm to find important connections between two or more nodes, and a result on virus propagation in graphs.
BIOGRAPHICAL NOTE
Christos Faloutsos is a Professor at Carnegie Mellon University. He has received the Presidential Young Investigator Award by the National Science Foundation (1989), the Research Contributions Award in ICDM 2006, nine "best paper'' awards, and several teaching awards. He has served as a member of the executive committee of SIGKDD; he has published over 160 refereed articles, 11 book chapters and one monograph. He holds five patents and he has given over 20 tutorials and over 10 invited distinguished lectures. His research interests include data mining for streams and networks, fractals, indexing for multimedia and bio-informatics data, and database performance.
March 2 David A. Smith Johns Hopkins University
Bootstrapping Monolingual Parsers from Multilingual Data
The creation of the Penn Treebank and similar datasets ca. 1990
produced a flowering of research on empirically trained parsers, which
is now bearing fruit in information extraction and machine translation
(e.g. Weischedel 2004, Quirk et al. 2005, Marcu et al. 2006). This
revolution has bypassed most languages and domains, however, due to
the expense of creating treebanks. Semisupervised learning methods
such as bootstrapping and cotraining have the potential to leverage
diverse sources of knowledge for robust statistical parsing in these
new settings.
We argue that bootstrapping a parser from limited supervised data
(~50-100 trees) is most promising when the model uses a rich set of
redundant features, as in recent supervised models for scoring
dependency parses (McDonald, Crammer, and Pereira, 2005). We show how
to aid bootstrapping efficiently by drawing new features from a parser
in another domain or even another language, via parallel corpora or
dictionaries. These quasi-synchronous grammars extend prior
bootstrapping work with synchronous grammars (Hwa et al. 2002, Smith &
Smith 2004, Jansche 2005) and also have applications in translation
modeling (Smith & Eisner 2006).
Drawing on Abney's (2004) analysis of the Yarowsky algorithm, we
present a family of new on-line bootstrapping algorithms that optimize
a likelihood-like loss function with with generalized entropy
regularization. We show that this approach avoids the losses in
accuracy incurred by EM-based learning. Combining diverse knowledge
sources in a conditional model of graph spanning trees, we learn
improved parsers of Czech, German, and other non-projective languages.
We argue that these techniques are broadly applicable to
bootstrapping in other NLP domains with a wealth of overlapping
features.
Bio:
David Smith received his A.B. in classics from Harvard University. An
NSF graduate fellow, he is currently a Ph.D. student in computer
science at Johns Hopkins University's Center for Language and Speech
Processing. His interests are in machine translation, natural
language parsing, and semi-supervised machine learning methods. David
was formerly head programmer for the Perseus Digital Library Project
at Tufts University, where he strayed from the path of
classical philology toward automatic morphological processing,
geocoding, and information extraction.
February 2 Mahadev Satyanarayanan
Carnegie Group Professor of Computer Science
Carnegie Mellon University
Finding Needles in a Haystack with Diamond
How does an expert discover something relevant to a task in a large
distributed repository of complex and loosely-structured data? For
example, how does a military intelligence analyst identify suspicious
events from recent satellite images and surveillance videos? Or, how
does a pharmaceutical researcher identify adverse effects of a drug in
a large collection of automated cell microscopy images? The terms
"suspicious" and "adverse effects" refer to vague concepts. More
precise definitions can only be given after examining the data in some
depth. In other words, hypothesis-formation and hypothesis-validation
proceed hand-in-hand in a tightly-coupled and iterative sequence. We
refer to this inherently human-centric activity as "interactive data
exploration."
Diamond is an open-source software platform for interactive search
of complex data that has been jointly developed by Intel Research and
Carnegie Mellon. It implements the concept of "early discard." This
makes brute-force interactive search practical by eliminating
irrelevant data as cheaply as possible. Further, Diamond embodies the
concept of "self-tuning." This allows it dynamically adapt to
different hardware configurations, workloads, and data content in a
manner that is completely transparent to users and applications.
Medical and pharmaceutical researchers at University of Pittsburgh
Medical Center, University of Pittsburgh School of Medicine,
and Merck are collaborating with Diamond researchers to apply Diamond
to their domain-specific tasks. This may open the door to research
and diagnostic strategies that were not considered feasible until now.
January 26 Jaime Carbonell
Allen Newell Professor of Computer Science
Director, Language Technologies Institute
Context-Based Machine Translation
In 2001, Eli Abir, an off-the-grid inventor without
formal schooling, came up with a novel way of
combining long n-grams, which he called "linguistic
DNA." This idea formed the basis of a new Machine
Translation Paradigm now called Context-Based MT, or
CBMT, after many years of hard work including infusion
of relevant MT and IR component technologies. The
presentation will focus on how CBMT works, including
how CBMT achieved the highest BLEU scores reported to
date on unseen newswire text, ableit so far only on
Spanish-English.
The data ingredients required by CBMT are a
comprehensive bilingual dictionary, a very large
target-language-only corpus and an optional smaller
source-langauge corpus -- and nothing else, no grammar
rules and no biligual corpus. These resorces are used
by two key processes, one called the "flooder",
corresponding to a translation model, and the other
called the "n-gram overlap resolver", corresponding to
the decoder. A third component finds phrasal
near-synonyms via an unsupervised learning process to
overcome impasses when long n-grams fail to resolve.
December 15 Candace L. (Candy) Sidner
Senior Research Scientist
Mitsubishi Electric Research Labs
Cambridge, Massachusetts
Collaborative Interface Agents:
On the screen and on the robot
This talk will present work on building interface agents that
collaborate with users in their activities. In particular, I will
discuss two different kinds of interfaces and illustrate collaboration
with each. The first is the DiamondHelp system, a general tool which
provides guidance to users of consumer electronic products.
DiamondHelp was designed to use a GUI interface and optional speech
capabilities to support collaborative conversation with the
DiamondHelp agent while at the same time allowing the user to interact
with the product itself. I will demo DiamondHelp guiding a user in
programming a high-end washer/dryer, and discuss the architecture of
the system and how it might be used in applications beyond consumer
electronics.
The talk will then discuss work on collaboration with a humanoid robot.
Our goal has been to understand how collaborative interactions take
advantage of the engagement, or perceived connection, between the
participants and apply that to human interactions with a robot. In
this talk I will focus on recent efforts in interpreting human
conversational nodding and in locating people in an open environment.
I will briefly discuss how aspects of this technology can be applied to
non-robotic interfaces.
Bio:
Candace L. (Candy) Sidner is an expert in user interfaces, especially
those using speech, natural language understanding, and collaboration.
Candy is senior research scientist at Mitsubishi Electric Research
Labs in Cambridge, Massachusetts. She is currently working on
human-robot interaction, focused on the role of engagement in those
interactions, and on interface applications involving collaborative
interface agents in the COLLAGEN project. She is a Fellow and past
Councilor of the American Association for Artificial Intelligence, a
senior member of the IEEE, and a member of the scientific advisory
board for the EU Cognitive Systems for Cognitive Assistants (CoSy)
project. She is currently general chair for HLT-NAACL 2007. She has
served as program cochair of Intelligent User Interfaces 2006, SIGIAL
2004, chair of the International Conference on Intelligent User
Interfaces in 2001, and President of the Association for Computational
Linguistics. She received her Ph.D. from MIT in Computer Science.
URL: www.merl.com/people/sidner
November 17 Brian MacWhinney
CMU Psychology Department
Computational Linguistics and Language Learning
The study of language learning is a major research focus in linguistics, psycholinguistics, neuroscience, philosophy, psychology, education, sociology, and applied linguistics. Language learning data play a central role in important debates regarding language innateness, modularity of mind, brain plasticity, and processes of language change. The resolution of these debates often hinges on having access to large quantities of transcribed interactions of parents (or teachers) and children (or second language learners). To address this need, we have constructed the CHILDES and TalkBank databases which now contain over 14 million utterances for English and lesser amounts for a collection of 26 additional languages. Many of these transcripts are directly linked to audio and video.
To process this database, we are relying increasingly on tools from computational linguistics, including taggers, parsers, annotators, schema, and web-delivery mechanisms. In this talk, we will review recent work in three areas:
1. The logical problem of language acquisition. Here, we will see how searches of the CHILDES database point to a resolution of the logical problem of language acquisition that is very different from that proposed in Chomsky’s minimalist theory. Instead, we will see how item-based patterns can be configured to acquire complex structures on the basis of positive evidence.
2. The extraction of grammatical relations. Here we have constructed taggers for eight languages that are tuned to the specific requirements of child-parent discourse. The output of these taggers can now be processed through a deterministic grammatical relations parser to yield tagged dependency structures for one important English training corpus at accuracy rates rising above 95%. These results can then be used to compute developmental indices of proven clinical and diagnostic importance.
3. Interactional analysis. By linking transcripts to video over the web, we have been able to provide support for a wide range of microgenetic conversation-analytic studies of learning in classroom contexts.
Finally, in the context of work supported by the PSLC (Pittsburgh Science of Learning Center), we have used basic computational linguistic tools and Bayesian principles to predict language learning difficulties in French (gender, dictation) and Chinese (pinyin, vocabulary, characters). These methods rely on the principle of graduated interval recall for maximization of learning efficiency
November 10 Anatole Gershman
Accenture Technology Labs
The Paradox in Services R&D: Moving Innovation from the Field into the Labs
While services constitute about 80% of the U.S. economy, the leading companies in this sector spend very little on formal R&D. This is true not only for companies such as Wal-Mart or Citibank but also for the largest IT consulting companies. Yet, the services sector is very innovative - the rise of eCommerce being but the most visible recent example. Innovation in the services sector has been happening in the field as an entrepreneurial activity rather than as a deliberate R&D process either in industry or academia. Historically, institutional R&D and academic programs emerged only when a sector reached an advanced state of industrialization: for services that would include repeatability, measurability and mass customization. We believe that services are entering the age of industrialization and require dedicated industrial-strength R&D that goes beyond entrepreneurial innovation.
What should be the scope of services R&D and what are the main challenges? Clearly, we need to explore what kinds of services will be possible in the future and how they could be implemented in a scalable fashion. I argue that services are evolving along 5 dimensions:
1. Granularity – the object of services will be individuals and individual objects as opposed to groups and categories today.
2. Ubiquity – services will be delivered everywhere they are needed and not at fixed service locations.
3. Timing – services will be delivered increasingly in real time.
4. Contextualization – services will be increasingly aware of the specific context of each user, including his or her location, intentions and surrounding objects.
5. Intelligence – the ability to deliver the right stuff at the right time and to adapt to changes.
Progress along these dimensions presents formidable challenges to applied computer science. These challenges can be grouped into three categories. The first is the challenge of “sensing.” Services will need to identify and track people, objects and events relying on often sketchy information from physical sensors or on textual or voice descriptions. The second challenge is the challenge of “thinking.” Services will need to recognize patterns of behavior and optimize their responses. This will require extensive modeling of people and activities which will have to be constructed, maintained and adapted automatically with minimum human intervention. The third challenge is the challenge of “acting.” Rapidly evolving technology creates many means for interaction with customers - from wearable or even implanted devices to large public displays and “smart” objects. Services will need to recognize and use the best means for achieving the task at hand. Whereas sensing-thinking-acting is the typical cycle of an intelligent agent, we mean the above at a full distributed systems level.
Accenture was one of the first IT services companies to recognize the need for industrial-strength R&D and started its first dedicated lab in 1989. We systematically scanned the emerging technologies and evaluated their potential for disruptive changes in services. In this presentation, I will discuss our approach to building services that “sense, think and act” in the world and will give many examples of the prototypes we have built.
Exciting as it is working in an applied lab, there are many critical problems that cannot be properly addressed there - problems that can only be pursued at academic labs because they require more fundamental long-range research. I will discuss the four areas which I think are particularly important for achieving the “sense, think and act” vision:
• fusion of information from disparate sources without which “sensing” is of limited use,
• modeling of complex activities which is the basis of “thinking,”
• knowledge and language acquisition through active learning, essential for scaling up, and finally
• creation of smart objects that enable intelligent “acting”
December 3 Mohit Kumar, LTI PhD student
Learning from the Report-writing Behavior of Individuals
In this talk, I describe a briefing system that learns to predict the contents of reports generated by users who create periodic (weekly) reports as part of their normal activity. The system observes content-selection choices that users make and builds a predictive model that could, for example, be used to generate an initial draft report. Using a feature of the interface the system also collects information about potential user-specific features. The system was evaluated under realistic conditions, by collecting data in a project-based university course where student group leaders were tasked with preparing weekly reports for the benefit of the instructors, using the material from individual student reports.
We address the question of whether data derived from the implicit supervision provided by end-users is robust enough to support not only model parameter tuning but also a form of feature discovery. Results indicate that this is the case: system performance improves based on the feedback from user activity. We find that individual learned models (and features) are user-specific, although not completely idiosyncratic. This may suggest that approaches which seek to optimize models globally (say over a large corpus of data) may not in fact produce results acceptable to all individuals."
Joint work with Prof Alex Rudnicky & Nikesh Garera.
Simon Fung,
LTI PhD student
Designing an Elicitation Corpus with Semantic Representations
This talk will describe the design and creation of the LDC Elicitation Corpus, part of the AVENUE project. An elicitation corpus is a set of sentences that illustrate various semantic categories (e.g. number, gender) and constructions (e.g. relative clauses), to be translated into a language being studied. Before creating the source sentences, in this project we first create a semantic representation for each sentence, which specifies the values in various semantic categories (e.g. singular or plural for number) that the sentence should contain. This way, we can fix semantic details more precisely than is possible with only the sentence in the source language.
October 27 Rong Jin
Michigan State University
Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
Maximum margin clustering extends the theory of support vector machine to unsupervised learning, and has shown promising performance in recent studies. However, it has three major problems that question its application of real-world applications: (1) it is computationally expensive and difficult to scale to large-scale datasets; (2) it requires data preprocessing to ensure the clustering boundary to pass through the origins, which makes it unsuitable for clustering unbalanced dataset; and (3) its performance is sensitive to the choice of kernel functions. In this paper, we propose the "Generalized Maximum Margin Clustering" framework that addresses the above three problems simultaneously. The new framework generalizes the maximum margin clustering algorithm in that (1) it allows any clustering boundaries including those not passing through the origins; (2) it significantly improves the computational efficiency by reducing the number of parameters; and (3) it automatically determines the appropriate kernel matrix without any labeled data. Our empirical studies demonstrate the efficiency and the effectiveness of the generalized maximum margin clustering algorithm. Furthermore, in this talk, I will show the theoretical connection among the spectral clustering, the maximum margin clustering and the generalized maximum margin clustering.
Bio: Dr. Rong Jin is an assistant Prof. of the Computer and Science Engineering Dept. of Michigan State University since 2003. He is working in the areas of statistical machine learning and its application to information retrieval. In the past, Dr. Jin has worked on a variety of machine learning algorithms, and has extensive experience with the application of machine learning algorithms to information retrieval. Dr. Jin holds a B.A. in Engineering from Tianjin University, an M.S. in Physics from Beijing University, and an M.S. and Ph.D. in the area of language technologies from Carnegie Mellon University.
October 20 Stephan Vogel
LTI Research associate
Statistical Machine Translation at LTI: What was Done, What’s to Come
Statistical Machine Translation (SMT) had been proposed in the 90s and since been taken up by many research groups to make it a very active research field. One could even get the impression that SMT has become the most dominant paradigm in machine translation. Most SMT systems are so-called phrase-based translation system, using phrasal translations learned from large bilingual corpora as building blocks. In this talk I will give an overview of the SMT system which is under development at LTI in close collaboration with University of Karlsruhe.
Statistical machine translation is data-driven in that existing translations are used to learn the mapping, in terms of lexical transfer and word order, from source to target language. This requires word and phrase alignment algorithms. A quick overview of word alignment approaches will be given, followed by a presentation of phrase alignment as currently used in our SMT system.
As second important component in an SMT system is the language model. Recent development has gone towards using long history n-gram language models trained on very large corpora. We address this requirement by using a suffix array based language model which does not impose any restrictions to history length (whatever is in the corpus, we can use it), and which can be distributed over many computers to scale up to large corpora.
The next component of the SMT system is the decoder, which implements a search for the best translation given the models. Interesting aspects of the decoder are the overall search strategy, esp. how word reordering is realized, and the size of the search space is controlled by hypothesis recombination and pruning. Multiple components interact in the decoder by combining the different models log-linearly. This leads to the problem of finding optimal weights for the different models. N-best translation lists are used in this optimization process. A second benefit from generating n-best list comes from using additional features, which can not be used in full decoding, in n-best list rescoring.
The presentation will conclude with an overview of current research topics we are concentration on in our group: speech translation, i.e. translating lattices and using information from the speech recognizer inside the SMT decoder; building SMT systems which run on handheld devices; bringing morphology and syntax into SMT systems; learning from non-parallel corpora; developing reliability measures to annotate the translation output for downstream processing like summarization and question answer.
October 16 Hermann Helbig
University at Hagen, Germany
Multilayered Extended Semantic Networks as a Knowledge Representation Paradigm and Interlingua for Meaning Representation
Abstract:
The talk gives an overview of Multilayered Extended Semantic Networks (abbreviated MultiNet), which is one of the most comprehensively described knowledge representation paradigms used as a semantic interlingua in large-scale NLP applications and for linguistic investigations into the semantics and pragmatics of natural language.
As with other semantic networks, concepts are represented in MultiNet by nodes, and relations between concepts are represented as arcs between these nodes. Additionally to that, every node is classified according to a predefined conceptual ontology forming a hierarchy of sorts, and the nodes are embedded in a multidimensional space of layer attributes and their values. MultiNet provides a set of about 150 standardized relations and functions which are described in a very concise way including an axiomatic apparatus, where the axioms are classified according to predefined types. The representational means of MultiNet claim to fulfill the criteria of universality, homogeneity, and cognitive adequacy. In the talk, it is also shown, how MultiNet can be used for the semantic representation of different semantic phenomena.
To overcome the quantitative barrier in building large knowledge bases and semantically oriented computational lexica, MultiNet is associated with a set of tools including a semantic interpreter NatLink for automatically translating natural language expressions into MultiNet networks, a workbench LIA for the computer lexicographer, and a workbench MWR for the knowledge engineer for managing and graphically manipulating semantic networks. The applications of MultiNet as a semantic interlingua range from natural language interfaces to the Internet and to dedicated databases, over question-answering systems, to systems for automatic knowledge acquisition.
Biographical Note:
Hermann Helbig is head of the chair "Intelligent Information and Communication
Systems" at the University of Hagen, Germany. He holds an M.S. (Diploma) in
Physics, a Ph.D. in Computer Science (Dr.rer.nat.) in the field of AI, and
has got his habilitation (Dr.rer.nat.habil.) in the field of Knowledge
Representation. From 1970 through 1992 he held several positions in industrial research
in AI and Computational Linguistics and was simultaneously lecturer
for Artificial Intelligence at the University of Technology at Dresden.
His main research interests lie in the field of semantically based natural
language processing with applications in question-answering systems
and natural language interfaces to the Internet. Former sabbatical stays
led him to ICSI in Berkeley and to the Universities of Buffalo (USA), and
Edinburgh, Sheffield and London (UK). He wrote several books in the field
of AI, the last one: "Knowledge Representation and the Semantics of Natural
Language" is one of the standard textbooks about semantic networks.
September 29 Dan Melamed
New York University
Scalable Discriminative Learning for Powerful Translation Models
Abstract:
The translational equivalence relations in a surprisingly high
fraction of ordinary bitexts cannot be effectively explained by
commonly used translation models. Models of translational equivalence
with more expressive power are necessary. However, currently popular
machine learning techniques do not scale up well for these more
powerful models, which limits these models' practical utility. I
shall present a new purely discriminative learning method for
structured prediction problems, including parsing and translation.
This method scales up to millions of features over large training
sets, such as those used for statistical MT. Experiments have shown
that even context-sensitive models can be effectively trained using
this method. If there is interest, I will also give an overview of
the GenPar toolkit, which made all of this work possible.
September 22 Tomoki Toda
Faculty, Shikano Laboratory
Nara Advanced Institute of Technology,
Japan
Improving Body Transmitted Unvoiced Speech with Statistical Voice Conversion towards Silent-Speech Telephone
Abstract:
Cellular phones have enabled us to communicate with each other by
speech whenever and wherever. However, it has caused a problem. Speech
is recognized as NOISE by the other persons around a speaker in some
situations such as a meeting. In order to address this problem, we aim
to realize ``silent-speech telephone'' allowing speech communication
annoying nobody in any situation. Non-Audible Murmur (NAM) enables us
to talk while keeping silent. However, it is hard to directly use NAM
as a medium for human communication because of its less intelligibility
and unfamiliar sounds. In order to address this problem, we propose a
conversion method from NAM to ordinary speech (NAM-to-Speech).
In advance, we train GMMs for representing correlations between acoustic
features of NAM and those of speech using around 50 utterance pairs of
NAM and speech. Once we train those GMMs, we can convert any sample
of NAM to that of speech with statistical feature conversion based on
maximum likelihood estimation. Although NAM-to-Speech converts NAM
to intelligible voices with similar quality to speech, there is still a large problem, i.e., difficulties of the F0 estimation from unvoiced speech. In order to avoid this problem, we propose another conversion method from NAM to whisper that is a familiar and intelligible unvoiced speech (NAM-to-Whisper).
Moreover, we enhance NAM-to-Whisper so that multiple types of
body-transmitted unvoiced speech such as NAM and Body Transmitted
Whisper (BTW) are accepted as input voices. We evaluate the performance
of the proposed conversion method. Experimental results demonstrate that
1) intelligibility and naturalness of NAM are significantly improved by
NAM-to-Whisper, 2) NAM-to-Whisper outperforms NAM-to-Speech,
and 3) we can train a single conversion model successfully converting both NAM and BTW to the target voice.
Bio: Tomoki Toda received the B.E. degree in electrical engineering from
Nagoya University in 1999 and the M.E. and Ph.D. degrees in
engineering from the Graduate School of Information Science, NAIST in
2001 and 2003, respectively. During 2001-2003, he was an intern
researcher and a visiting researcher at ATR-SLT. He was a Research
Fellow of the JSPS in Graduate School of Engineering, NITECH during
2003-2005. He was a visiting researcher at LTI, CMU from October 2003
to September 2004. He is currently an Assistant Professor of the
Graduate School of Information Science, NAIST and a visiting
researcher at ATR-SLC.
August 16, 2006
Tong Zhang
Yahoo! Research
Some Theoretical and Algorithmic Issues in Large Scale Ranking and Scoring
I will discuss prediction problems encountered in internet companies such as Yahoo, and then focus on
the web-search problem. Mainly I will discuss some theoretical issues in this setting. I will then talk
about some related algorithmic issues in the context of optimizing the scoring function of a statistical
translation system, where the decoding procedure is treated as a black box. I will present some initial
results as well as problems encountered.
Bio: Tong Zhang received a B.A. in mathematics and computer science from Cornell
University in 1994 and a Ph.D. in computer Science from Stanford University
in 1998. After being a research staff member of IBM T.J. Watson Research
Center in Yorktown Heights, New York, he joined Yahoo in 2005. His research
interests include machine learning, numerical algorithms, and their
applications.