LTI Projects
There are currently a number of active projects at the LTI, in the fields of machine translation, speech, information retrieval, knowledge acquisition, educational applications,computational biology and other related projects.
Machine Translation Projects
The Avenue project has both social and scientific goals in Machine
Translation.
Contact: Jaime
Carbonell
|
Babylon
Two-way speech-to-speech translation on a handheld computer |
This project applies our speech-to-speech translation
technology to limited consumer hardware. The Speechalator demonstrator
offers two-way speech-to-speech translation from English to Arabic
and Arabic to English in the domain of medical interviews running
a standard consumer ipaq PDA. This project investigates techniques
for rapid development of speech and translation support in new
languages as well as ensuring the results can be used on a truly
portable device.
Contact Alan
W Black, Tanja
Schultz and Alex
Waibel
|
KANT
Knowledge-based Machine Translation |
The KANT project, Knowledge-based, Accurate
Translation for technical documentation, was founded in 1989 for
the research and development of large-scale, practical translation
systems for technical documentation. KANT uses a controlled vocabulary
and grammar for each source language, and explicit yet focused
semantic models for each technical domain to achieve very high
accuracy in translation. Designed for multilingual document production,
KANT has been applied to the domains of electric power utility
management and heavy equipment technical documentation.
Contacts: Eric
Nyberg and
Teruko Mitamura
GALE (Global Autonomous Language Exploitation)
The goal of the GALE program is to develop and apply computer software
technologies to absorb, analyze and interpret huge volumes of speech
and text in multiple languages. The LTI's machine translation efforts
within this program are aimed at generating the highest-quality
translation by improving statistical, example-based, and
transfer-based machine translation systems and producing a
multi-engine combination of their outputs which outperforms any single
translation system used in the combination.
Contacts: Main: Jaime
Carbonell Architecture: Eric
Nyberg Distillation:Yiming
Yang EBMT: Ralf Brown
Statistical MT: Stephan Vogel
Speech Projects
|
CAMMIA
Dialogue
Management System
|
The CAMMIA project (A Conversational Agent for
Multilingual Mobile Information Access) is focused on research
and development of a multi-tasking dialog management system that
can be used with automatic speech recognition and VoiceXML to
provide mobile information access.
Contacts: Teruko
Mitamura and Eric
Nyberg
|
Fluency
Foreign language accent correction |
Fluency uses speech recognition (SPHINX II) to
help users perfect their accents in a foreign language. The system
detects pronunciation errors, such as duration mistakes and incorrect
phones, and offers visual and aural suggestions as to how to correct
them. The user can also listen to himself and to a native speaker.
Contact: Maxine
Eskenazi
This project is designed to provide the tools,
scripts and documentation to allow people to build synthetic voices
for use with general speech applications. Support for English
and other languages is provided. Voices produced by these methods
run within Edinburgh University's
Festival Speech Synthesis System. We also are developing a
small, fast synthesis engine suitable for these voices called
Flite. This project involves
a number of aspects of speech synthesis research including prosodic
modelling, unit select synthesis, diphone synthesis, text analysis,
lexicon representation, limited domain synthesis. It also provides
a forum for research and development of automatic labelling tools
and synthesis evaluation tools. Voices build from these methods
have been used in other CMU and external projects such as
CMU Darpa Communicator spoken dialog system, and a Croatian
synthesizer for the DIPLOMAT/Tongue
project
Contact: Alan
W Black
We have always wanted our machines to talk to us, but most people have
strong preferences for particular voices. Current techniques in speech synthesis can build
voices that sound very close to the original speaker, capturing the style, manner and
articulation of the source voice. However such systems require many hours of
carefully recorded speech and expert tuning to reach an acceptable level of quality.
An exciting new alternative method for building synthetic voices is voice transformation.
Here we use an exsisting recorded database and convert it to a target voice using as
little as 10-20 sentences. These techniques offer the potential to make speech synthesizers
talk in whatever voice we desire, with significantly less effort required than previous techniques.
This project offers a new direction in voice transformation. Current transformation techniques
concentrate on a spectral mapping of the voice, i.e. converting the properties of the speech
signal. Instead we can use the underlying positions of the vocal tract articulators (i.e.
the position of the teeth, tongue, lips, velum) which give rise to the spectral output of the voice.
Using new statistical modeling techniques we can successfully predict the positions of a speaker's
articulators from the speech signal. Then in the virtual vocal tract domain map between speakers
and regenerate the speech for the target voice.
This work enables the easy construction of new synthetic voices allowing personalization of
speech output. It increases our knowledge of the speech generation process and characterizes
what make a voice personal.
Contact: Alan
W Black
|
LET'S
GO! A Spoken Dialog System For the General Public |
The Let's Go! project is building a spoken dialog
system that can be used by the general public. While there has
been success in building spoken dialog systems that are able to
interact well with people (for example, the
CMU Communicator system), these systems often work only for
a limited group of people. The system we are developing for Let's
Go! is designed to work with a much wider population, including
groups which typically have trouble interacting with dialog systems,
such as non-native English speakers and the elderly.
The Let's Go! project works in the domain of
bus information for Pittsburgh's Port Authority Transit bus system.
The system provides a telephone-based interface to bus schedules
and route information.
Contacts: Maxine
Eskenazi and Alan
W Black
This project applies aspects of speech technology
and machine learning to aid communication with dolphins. Working
with Prof Denise Herzing of the Wild Dolphin Project (http://www.wilddolphinproject.com)
who has studied, recorded and documented dolphin populations over
the last 20 years, we are looking at automatically identifying
dolphins by their signature whistle, classifying other signals
as well as developing more general techniques to aid communication.
Contacts: Robert
Frederking, Tanja
Schultz and Alan
W Black
|
Ravenclaw
|
Ravenclaw is an advanced architecture for dialogue management based on a
dynamic representation that captures knowledge about the task that
humans perform in given domains. It is the base for a number of dialogue
system projects in LTI and elsewhere at Carnegie Mellon. Current
dialogue management research centers on two topics: 1) Developing
techniques for self-awareness that allow systems to adaptively detect
and recover from misunderstandings, 2) Automating the configuration of
dialogue systems by inferring task and dialogue structure from
human-human interactions in limited domains.
Contact: Alex Rudnicky
|
TeamTalk
|
Dialogue systems mostly involve one human talking to one machine. What
about interaction between the members of a human-robot team? The project
focuses on two specific research issues: The management of
multi-participant dialogues (touching on issues such as turn-taking) and
the development of grounding strategies that allow humans and robots to
agree on mutually-understandable descriptions of objects and actions in
the context of a treasure hunt.
Contact: Alex Rudnicky
|
Sphinx |
The Sphinx project is an umbrella for research in basic speech
technologies. Current activities include real-time adaptive speech
recognition in meetings, the exploration of ensemble techniques
(creating and using multiple decoders to improve recognition accuracy)
and techniques for real-time recognition. Sphinx recognition also
supports research in meeting understanding and related activities. The
Sphinx recognition code-base is open-source and is used by a number of
projects in LTI, elsewhere in the University as well as by a large
number of other sites.
Contact: Alex Rudnicky
|
SPICE
Speech Processing Interactive Creation and Evaluation Toolkit |
Speech technology potentially allows everyone to participate in today's information revolution and can bridge the language barrier gap. Unfortunately, construction of speech processing systems requires significant resources. With some 4500-6000 languages in the world, traditionally speech processing is prohibitive to all but the most economically viable languages. In spite of recent improvements in speech processing, supporting new languages is a skilled job requiring significant effort from trained individuals. This project aims to overcome both limitations by providing innovative methods and tools for naive users to develop speech processing models, collect appropriate data to build these models, and evaluate the results allowing iterative improvements. By integrating speech recognition and synthesis technologies into an interactive language creation and evaluation toolkit usable by unskilled users, speech system generation will be revolutionized. Data and components for new languages will become available to everybody improving the mutual understanding and the educational and cultural exchange between the U.S. and other countries.
Contact: Tanja Schultz and Alan
W Black
|
TransTac Spoken Language Communication and Translation System for Tactical Use |
In this program we develop technologies that enable robust spontaneous two-way tactical speech communications between American warfighters and native speakers. In this context we are investigating issues surrounding the rapid deployment of new languages, especially, low-resource languages and colloquial dialects. Currently we are working on two-way translation between English and Arabic Iraqi. TransTac builds on our existing speech translation technology.
Contact: Tanja Schultz and Alan
W Black
|
SUBLIME
Speech and Language Based Information Management Environment
|
Project SUBLIME develops and tests a speech-
and language-based interfaces for information management. We seek
to develop cognitively palatable methods for people to access,
add to, and modify both personal and public information spaces.
Whereas most existing speech interfaces provide wide functionality
in a given narrow domain, a SUBLIME interface seeks to provide
a relatively narrow functionality (information management) in
unrestricted domains.
Contact: Roni
Rosenfeld
|
SLT4ID: Speech and Language Technologies for International Development
|
In underserved communities around the world, spoken language systems are potentially more natural,
cheaper to deploy/maintain/upgrade, and place less requirements on the user (such as literacy) than traditional PC/GUI-based
systems, while still offering valuable services such as information access and education. Our first project in this domain,
in collaboration with Aga Khan University (Karachi, Pakistan), focuses on creating a speech user interface for accessing
health information resources by community health care workers in Pakistan. We are investigating both audio-only & multi-modal
interfaces, and aim to continously ground the research empirically through user studies with the target population. Through
this research, we hope to better understand the levels of literacy (if any) that speech-based information access interfaces
are compelling.
Contact: Roni
Rosenfeld
Human-machine speech based communication, especially
for mobile speech applications and internet speech portals, is
fast becoming a reality. Communication with such machines and
information-servers does not require the full strength of natural
language, nor should it have to cope with its ambiguities. What
then is the ideal form of human-machine speech communication?
Will there develop a particular style for talking to machines?
If so, can we help this process along by developing principles
for it? In the Universal Speech Interface (USI) project, we develop
and test such principles. In essence, we are trying to do for
speech communication what Grafitti(tm) has done for mobile text
entry (see also The USI Manifesto).
Contact: Roni
Rosenfeld
Information Retrieval Projects
The Information Retrieval projects include a wide range of issues
related to finding, organizing, analyzing, and communicating information.
|
Adaptive Information Filtering
|
Automatically monitoring a stream of documents
(e.g., news stories, news groups, etc) to find just those stories
that are interesting to you. Learning from example what kinds
of documents you find interesting.
Contact:
Jamie Callan and Yiming
Yang
|
Distributed Information Retrieval
/ Federated Search |
Hundreds of thousands of specialized search engines
are available on the Internet, but the contents of many are hidden
from general purpose search engines such as Google. This "hidden"
Web is estimated to be at least as large as the more traditional
"visible" Web. Distributed Information Retrieval (now often called
Federated Search) systems provide a single point of access for
documents that are in different formats, in different languages,
in different types of search engines, and controlled by other
people. This research area also covers large, peer-to-peer networks
of heterogeneous digital libraries.
Contact:
Jamie Callan
|
Email Classification and Prioritization
|
Automatically assigning messages to user-defined
folders (classes) based on content, importance, communication
threads and users' organization strategies is a new challenge
for machine learning. Statistical modeling of multi-type interconnected
objects (users, messages, folders, keywords, etc.) is an important
step towards the development of a truly useful email classification
system.
Contact: Yiming
Yang
|
eRulemaking
Text Mining Techniques for Large Public Comment Databases |
Citizens and government administrators need
a variety of navigation aids and text analysis tools to help them
understand the contents of large public comment databases. These
aids and tools include full-text search, automatic construction
of browsing hierarchies, frequency analysis of discussion topics,
and summarization of similar comments, as well as more complex
analysis tools that identify stakeholder communities represented
in a set of comments. The underlying technologies are primarily
Information Retrieval, Text Datamining, and simple forms of Natural
Language Processing.
Contact:
Jamie Callan
Lemur is a collection of search engine algorithms and information retrieval applications used for IR research, development and education. Lemur provides a rich query language that supports search against simple texts, structured (XML) texts, and texts annotated with part-of-speech, named-entity, and other annotations used in NLP and text-mining applications. Lemur's search engines comfortably support collections ranging from a few gigabytes to a few terabytes of text. The software is distributed under open-source license, and is used widely in the IR research community.
Contact:
Jamie Callan
|
Briefing Assistant
|
The BA project addresses the problem of creating customized summaries
based on the preferences and information demands of humans report
preparers. The BA is learning-based and models both the information
selection and feature detection behavior of human summarizers. Current
work centers on temporal summarization, creating narrative accounts of
events that unfold over time.
Contact: Alex Rudnicky
|
JAVELIN
Open-Domain Question Answering |
Typical IR systems return a set of documents,
or perhaps a set of queries. LTI Question Answering software extracts
information from documents in large, open-domain corpora to answer
questions in subject areas that are not known in advance.
Contact: Eric
Nyberg and Teruko
Mitamura
 |
The SIDE Project
The Summarization Integrated
Development Environment - Supporting the Guide on the SIDE |
We are developing a configurable summarization environment that uses
multi-level analyses of discourse to support a new generation of
summarization technology addressing a variety of information management
tasks. SIDE embodies a two stage process in which language processing
technology is used first to impose structure on the stream of discourse
behavior from expository text, on-line chat, email, or newsgroup style
interaction, and then in a second stage patterns of interest are noted,
catalogued, and reported at the interface. One application of this
technology that we are working on is constructing reporting facilities
for on-line group learning facilitators.
Contact: Carolyn Rosé
 |
The
REAP Project
Reader-Specific Lexical Practice
for Improved Reading Comprehension
|
The core ideas of the project are i) a search
engine that finds text passages satisfying very specific lexical
constraints, ii) selecting materials from an open-corpus (the
Web), thus satisfying a wide range of student interests and classroom
needs, and iii) the ability to model an individual's degree of
acquisition and fluency for each word in a constantly-expanding
lexicon so as to provide student-specific practice and remediation.
This combination enables research on a wide range of reading comprehension
topics that were formerly difficult to investigate.
Contacts: Jamie
Callan and Maxine
Eskenazi
|
Utility-based Information Distillation
|
We study supervised, unsupervised and semi-supervised learning techniques
for automatically detecting novel events and tracking the new trends for
relevant events from temporally-ordered documents, for dynamically updating
user profiles under context, and for optimizing the utility of passage
selection and summarization based on relevance, novelty, readability,
readability and user cost (e.g., time). Collaborative and adaptive
information filtering among multiple users is also a part of the open
challenge.
Contacts: Yiming
Yang and Jaime
Carbonell
Also see: TagHelper
2.0.
Knowledge Acquisition Projects
|
Dark Matter
Knowledge Acquisition
from Text |
LTI is participating in Project Halo, a research
effort to design and implement a "Digital Aristotle".
Our focus is on the definition of KAL (Knowledge Acquisition Language),
a form of controlled language that can be used to acquire domain
knowledge from subject matter experts in domains such as Chemistry,
Physics and Biology.
Contacts: Eric
Nyberg and Teruko
Mitamura
|
IAMTC
Interlingual Annotation of Multilingual Text Corpora |
IAMTC is a multi-site NSF ITR project focusing
on the annotation of six sizable bilingual parallel corpora for
interlingual content with the goal of providing a significant
data set for improving knowledge-based approaches to machine translation
(MT) and a range of other Natural Language Processing (NLP) applications.
The central goals of the project are: (1) to produce a practical,
commonly-shared system for representing the information conveyed
by a text, or interlingua (IL), (2) to develop a methodology for
accurately and consistently assigning such representations to
texts across languages and across annotators, (3) to annotate
a sizable multilingual of parallel corpus of source language texts
and translations for IL content.
Contacts: Lori
Levin and Teruko
Mitamura
|
Scone Symbolic Knowledge Base |
Scone is a symbolic knowledge representation
system designed to run well on a standard workstation. Scone's
primary design goals are ability to represent "common sense" knowledge,
efficiency in performing inference and search, scalability to
several million assertions, and ease of use.
Contact: Scott
Fahlman
Also see:
Tutalk and A
Shared Resource for Robust Semantic Interpretation for Both Linguists
and Non-Linguists.
Educational Applications
|
The Intelligent Writing Tutor (IWT) |
The Intelligent Writing Tutor (IWT) project for ESL learners explores the issue of transfer and long-term retention of acquired knowledge, as part of the PSLC's underlying goal of developing a theory of robust learning. Through a series of learning experiments, we will look at both positive and negative transfer from a student's native language (L1) to English, the effects of an informed knowledge tracer on learning, and the role of level-appropriate feedback in achieving competency.
Contact: Teruko
Mitamura
In this project we explore the impact of tutor
strategy and example selection on student explanation behavior.
The purpose is to identify strategies that make the most productive
use of the time students spend with a tutorial dialogue system.
We are collecting a corpus of tutoring dialogues in the calculus
domain in which students discuss worked out examples, which may
or may not contain an error in them, with a human tutor. The student
reasons through the worked examples, identifying, explaining,
and correcting errors. As part of this project we are experimenting
with automatic approaches to corpus analysis, applying and extending
approaches used previously for text classification, dialogue act
tagging, and automatic essay grading.
Contact: Carolyn
Rosé
We are developing a novel style of tutorial dialogue
system in which students discuss design choices as they are working
on an optimisation problem in the field of thermodynamics. The
purpose of CycleTalk is to engage students in negotiation dialogues
over pros and cons of alternative choices. In this way we hope
to elicit explanation behavior from students that is productive
for learning. The primary language technology research foci of
this project are dialogue management and robust language understanding.
Contact: Carolyn
Rosé
|
TagHelper
2.0
A Semi-Automatic Tool That Facilitates Reliable Content
Analysis of Corpus Data |
The goal of our research is to develop text
classification technology to address concerns specific to classifying
sentences using coding schemes developed for behavioral research.
A wide range of behavioral researchers including social scientists,
psychologists, learning scientists, and education researchers
collect, code, and analyze large quantities of natural language
corpus data as an important part of their research. A particular
focus of our work is developing text classification technology
that performs well on highly skewed data sets, which is an active
area of machine learning research.
Contact: Carolyn
Rosé
The majority of existing authoring tools for
constructing advanced conversational interfaces were designed
for use by computational linguists. Our research goal is to explore
strategies for supporting the development of language understanding
interfaces by non-linguists. In our previous work we have developed
Carmel-Tools, a behavior oriented authoring environment for building
semantic knowledge sources for the CARMEL core understanding engine.
In our recent work, we have begun conducting user studies that
aim to better understand how people process a large amount of
corpus data when faced with a task comparable to programming a
dialogue agent using a data driven methodology. Our preliminary
user study results hint that participants (1) introduce a bias
when processing data sequentially (i.e. primacy effects) and (2)
naturally represent semantic relatedness using spatial proximity.
Based on these observations, we have developed the InfoMagnets
interface that provides a physical metaphor for exploratory data
analysis that is consistent with user conceptions of semantic
relatedness and helps users avoid being biased by primacy effects
by gaining a birds-eye view of their whole inventory of dialogue
topics simultaneously.
Contact: Carolyn
Rosé
The purpose of the Learning Oriented Dialogue
Project is to investigate the reasons behind unproductive patterns
of student use of educational technology and to design interactions
that will successfully bring about an improvement in student behavior.
One aspect of this work focuses on developing a Peer Collaborative
Agent to work with students as they solve math problems. The research
literature investigating the construction of tutorial dialogue
and learning companion environments present parallel experiences
in attempting to emulate in technology what has been observed
to be effective for learning in human-human scenarios. We argue
that what is needed as a next step is a careful investigation
using controlled experimentation to construct a causal model of
how specific features of an agent’s behavior influence an
individual student’s behavior and learning. A key aspect
of our research agenda is to investigate previous claims about
best practices in learning companion design that have not been
subjected to rigorous evaluation. We do this using a particular
experimental design methodology, which provides a highly controlled
way to examine mechanisms by which one peer learner’s behavior
influences a partner learner’s behavior and learning. Specifically,
we make use of confederate peer learners who are experimenters
acting as peer learners but behaving in a highly scripted way.
This research has a strong empirical focus as well as a technology
development focus.
Contact: Carolyn
Rosé
|
Tutalk
Infrastructure for authoring and experimenting with natural
language dialogue in tutoring systems and learning research |
The focus of our proposed work is to provide
an infrastructure that will allow learning researchers to study
dialogue in new ways and for educational technology researchers
to quickly build dialogue based help systems for their tutoring
systems. Most tutorial dialogue systems that to date have undergone
successful evaluations (CIRCSIM, AutoTutor, WHY-Atlas, the Geometry
Explanation Tutor) represent development efforts of many man-years.
These systems were instrumental in pushing the technology forward
and in proving that tutorial dialogue systems are feasible and
useful in realistic educational contexts, although not always
provably better on a pedagogical level than the more challenging
alternatives to which they have been compared. We are now entering
a new phase in which we as a research community must not only
continue to improve the effectiveness of basic tutorial dialogue
technology but also must find ways of accelerating both the process
of investigating the effective use of dialogue as a learning intervention
as well as development of usable tutorial dialogue systems. We
propose to develop a community resource to address all three of
these problems on a grand scale, building upon our prior work
developing both basic dialogue technology and tools for rapid
development of running dialogue systems.
Contact: Carolyn
Rosé
With its emphasis on high stakes testing upheld
by the No Child Left Behind Act, the standards based education
movement promises to encourage rigor in our nation’s public
school education. We argue that what is needed is an efficient
means of continuous but unobtrusive monitoring of student progress,
consolidation of data, effective reporting, and instruction guided
by strategic assessment data. However, teachers face a fundamental
dilemma in trying to use assessment to guide instruction: assessment
takes time away from instruction and teachers cannot be sure the
time spent assessing will improve instruction enough to justify
the cost of lost instructional time. We are addressing this dilemma
by building and experimentally evaluating the effectiveness of
a web-based "Assistment" system for middle school math
in Massachusetts and Connecticut . On-line testing systems that
grade students and provide reports reduce the demands on the teacher.
However, they do not fundamentally address the assessment dilemma.
In contrast to previous approaches, the Assistment system aims
to 1) quickly predict student scores on standards-based tests,
2) provide timely feedback to teachers about how they can specifically
adapt their instruction to address student knowledge gaps (while
similarly providing reports to parents and administrators), and
3) provide an opportunity for students to get intelligent tutoring
assistance as assessment data is being collected. Assistments
provide more focused instruction than the feedback that is typically
given by on-line multiple-choice systems.
Contact: Carolyn
Rosé
Also see: Fluency,
The REAP project and Project
Listen.
Computational Biology
Pattern recognition from protein sequences and
automated mapping between sequences, folding structures and biological
functions is a new line of research where we actively collaborate
with biologists.
Contacts: Judith Klein-Seetharaman,
Jaime Carbonell,
Roni Rosenfeld,
Yiming Yang and Raj Reddy
|
Statistical-Computational Models of Molecular Evolution |
Molecular evolution is a stochastic computational process that has been running on massively parallel hardware for some 1017 seconds now, and which has resulted in many amazing local maxima along the way. The rapidly growing DNA and protein databases present a historic opportunity to model evolution at an unprecedented quantitative level, with enormous impact on medicine as well as on our fundamental understanding of life. In this project we combine statistical and computational methods to derive biological explanations and pharmacological predictions.
Contact: Roni Rosenfeld
|
Viruses, Vaccines, and Digital Life |
Viruses are the simplest known self-replicating computational systems. They also happen to be the leading emerging threat to humanity in the 21st century. Fortunately, the new understanding of life in general and viruses in particular as digital programs opens the door to computational methods of defending against these threats. This is a new project launched in collaboration with leading virologists at the University of Pittsburgh whose aim is to combine biological analysis with statistical learning methods to better understand viral evolution and accelerate vaccine development.
Contact: Roni
Rosenfeld
Other Projects
The Informedia project tries to understand video, and enable
search, visualization and summarization in both contemporaneous
and archival content collections. The core technology combines
speech, image and natural language understanding to automatically
transcribe, segment and index linear video for intelligent search
and image retrieval.
Contacts: Howard
Wactlar and Alex
Hauptmann
Project LISTEN's Reading Tutor listens to children
read aloud, and helps them learn to read. Project LISTEN offers
exciting opportunities for interdisciplinary research in speech
technologies, cognitive and motivational psychology, human-computer
interaction, computational linguistics, artificial intelligence,
machine learning, graphic design, and of course reading.
Contact: Jack
Mostow
Machine learning has been developed to the point where it can perform some truly useful tasks. However, much of the learning technology that's currently available requires extensive 'tuning' in order to work for any particular user, in the context of any particular task.
The focus of the RADAR project is to build a cognitive assistant that embodies machine learning technology that is able to function "in the wild" -- by this, we mean that the technology need not be tuned by experts, and that the person using the system that embodies the technology need not be trained in any special way.
Using the RADAR system itself, in the task for which it is designed, should be enough to allow RADAR to learn to improve performance.
RADAR is a joint project between SRI International and Carnegie Mellon University, and is funded by DARPA.
Contacts: Scott
Fahlman and Jaime
Carbonell
LTI related RADAR Components
Space-Time Planner: Contact: Eugene Fink
Knowledge Representation: Contact: Scott
Fahlman
Briefing Assistant: Contact: Alex Rudnicky
NLP/email: Contact: Eric Nyberg
Summarization Contact: Alex Rudnicky
We are working on techniques for identification of both known and
surprising patterns in massive databases, and on their application to
security challenges. For example, we may use the developed techniques
for identifying and tracking the spread of a new disease based on
medical databases, or for detecting patterns of malicious activity in
the network traffic. This work involves three main directions:
efficient indexing of massive databases; real-time identification of
patterns in a stream of newly incoming data; and search for surprising
changes in data patterns.
Contacts: Jaime Carbonell and Eugene Fink
TalkBank is an interdisciplinary research project
involving Carnegie Mellon University, the University of Pennsylvania,
and 7 other secondary collaborators. The goal of TalkBank is to
foster fundamental research in the study of human and animal communication.
It has constructed sample databases of transcripts linked to audio
and video within each of the 17 subfields studying communication.
We are using these databases to advance the development of standards
and tools for creating, sharing, searching, and commenting upon
primary materials via networked computers.
Contact: Brian MacWhinney (macw@cmu.edu)
|
WebKB
The World Wide Knowledge Base Project |
The World Wide Web is a vast source of information
accessible to computers, but understandable only to humans. The
goal of this research project is to automatically create a
computer understandable knowledge base whose content mirrors
that of the World Wide Web. If successful, this would lead to
much more effective retrieval of information from the web, the
use of this information to support new knowledge based problem
solvers. Our approach is to use machine learning algorithms to
train the system to extract information of the desired types.
Our web page describes the overall approach, plus several new
algorithms we have developed that successfully extract information
from the web.
Contact: Tom
Mitchell
|