Home


About

   Directions
   Admissions

   How To Apply

   The LTI Brochure


Education

   Ph.D.
   M.S.

   Undergrad Minor

   Courses

    FYI

LTI Forms

Seminars
   LTI Seminar Series
   Joint Speech Seminar (JSS)

   Machine Translation (MT)

   Student Research Symposium

   Information Retrieval Series


Visitor Information
   General
   Maps & Directions
   Hotel Links
   Parking Information


Research
   Projects

   Reports

    Dissertations


People

   Faculty

   Students

   Upcoming Graduates

   Staff

   Visitors   

   Who to See for What


Contacts


LTI Projects

There are currently a number of active projects at the LTI, in the fields of machine translation, speech, information retrieval, knowledge acquisition, educational applications and other releated projects in SCS/CMU.


Machine Translation Projects


AVENUE

The Avenue project has both social and scientific goals in Machine Translation.

Contact: Jaime Carbonell


Babylon
Two-way speech-to-speech translation on a handheld computer

This project applies our speech-to-speech translation technology to limited consumer hardware. The Speechalator demonstrator offers two-way speech-to-speech translation from English to Arabic and Arabic to English in the domain of medical interviews running a standard consumer ipaq PDA. This project investigates techniques for rapid development of speech and translation support in new languages as well as ensuring the results can be used on a truly portable device.

Contact Alan W Black, Tanja Schultz and Alex Waibel


KANT
Knowledge-based Machine Translation

The KANT project, Knowledge-based, Accurate Translation for technical documentation, was founded in 1989 for the research and development of large-scale, practical translation systems for technical documentation. KANT uses a controlled vocabulary and grammar for each source language, and explicit yet focused semantic models for each technical domain to achieve very high accuracy in translation. Designed for multilingual document production, KANT has been applied to the domains of electric power utility management and heavy equipment technical documentation.

Contacts: Eric Nyberg and Teruko Mitamura


GALE

GALE (Global Autonomous Language Exploitation) The goal of the GALE program is to develop and apply computer software technologies to absorb, analyze and interpret huge volumes of speech and text in multiple languages. The LTI's machine translation efforts within this program are aimed at generating the highest-quality translation by improving statistical, example-based, and transfer-based machine translation systems and producing a multi-engine combination of their outputs which outperforms any single translation system used in the combination.

Contacts:
Main: Jaime Carbonell
Architecture: Eric Nyberg
Distillation:Yiming Yang
EBMT: Ralf Brown
Statistical MT: Stephan Vogel


Speech Projects


CAMMIA
Dialogue Management System

The CAMMIA project (A Conversational Agent for Multilingual Mobile Information Access) is focused on research and development of a multi-tasking dialog management system that can be used with automatic speech recognition and VoiceXML to provide mobile information access.

Contacts: Teruko Mitamura and Eric Nyberg


Fluency
Foreign language accent correction

Fluency uses speech recognition (SPHINX II) to help users perfect their accents in a foreign language. The system detects pronunciation errors, such as duration mistakes and incorrect phones, and offers visual and aural suggestions as to how to correct them. The user can also listen to himself and to a native speaker.

Contact: Maxine Eskenazi


FestVox: Building Synthetic Voices

This project is designed to provide the tools, scripts and documentation to allow people to build synthetic voices for use with general speech applications. Support for English and other languages is provided. Voices produced by these methods run within Edinburgh University's Festival Speech Synthesis System. We also are developing a small, fast synthesis engine suitable for these voices called Flite. This project involves a number of aspects of speech synthesis research including prosodic modelling, unit select synthesis, diphone synthesis, text analysis, lexicon representation, limited domain synthesis. It also provides a forum for research and development of automatic labelling tools and synthesis evaluation tools. Voices build from these methods have been used in other CMU and external projects such as CMU Darpa Communicator spoken dialog system, and a Croatian synthesizer for the DIPLOMAT/Tongue project

Contact: Alan W Black


TRANSFORM
Flexible voice synthesis through articulatory voice transformation

We have always wanted our machines to talk to us, but most people have strong preferences for particular voices. Current techniques in speech synthesis can build voices that sound very close to the original speaker, capturing the style, manner and articulation of the source voice. However such systems require many hours of carefully recorded speech and expert tuning to reach an acceptable level of quality.
An exciting new alternative method for building synthetic voices is voice transformation. Here we use an exsisting recorded database and convert it to a target voice using as little as 10-20 sentences. These techniques offer the potential to make speech synthesizers talk in whatever voice we desire, with significantly less effort required than previous techniques.

This project offers a new direction in voice transformation. Current transformation techniques concentrate on a spectral mapping of the voice, i.e. converting the properties of the speech signal. Instead we can use the underlying positions of the vocal tract articulators (i.e. the position of the teeth, tongue, lips, velum) which give rise to the spectral output of the voice.

Using new statistical modeling techniques we can successfully predict the positions of a speaker's articulators from the speech signal. Then in the virtual vocal tract domain map between speakers and regenerate the speech for the target voice.

This work enables the easy construction of new synthetic voices allowing personalization of speech output. It increases our knowledge of the speech generation process and characterizes what make a voice personal.

Contact: Alan W Black


LET'S GO!
A Spoken Dialog System For the General Public

The Let's Go! project is building a spoken dialog system that can be used by the general public. While there has been success in building spoken dialog systems that are able to interact well with people (for example, the CMU Communicator system), these systems often work only for a limited group of people. The system we are developing for Let's Go! is designed to work with a much wider population, including groups which typically have trouble interacting with dialog systems, such as non-native English speakers and the elderly.

The Let's Go! project works in the domain of bus information for Pittsburgh's Port Authority Transit bus system. The system provides a telephone-based interface to bus schedules and route information.

Contacts: Maxine Eskenazi and Alan W Black


Towards Communicating with Dolphins

This project applies aspects of speech technology and machine learning to aid communication with dolphins. Working with Prof Denise Herzing of the Wild Dolphin Project (http://www.wilddolphinproject.com) who has studied, recorded and documented dolphin populations over the last 20 years, we are looking at automatically identifying dolphins by their signature whistle, classifying other signals as well as developing more general techniques to aid communication.

Contacts: Robert Frederking, Tanja Schultz and Alan W Black


Ravenclaw

Ravenclaw is an advanced architecture for dialogue management based on a dynamic representation that captures knowledge about the task that humans perform in given domains. It is the base for a number of dialogue system projects in LTI and elsewhere at Carnegie Mellon. Current dialogue management research centers on two topics: 1) Developing techniques for self-awareness that allow systems to adaptively detect and recover from misunderstandings, 2) Automating the configuration of dialogue systems by inferring task and dialogue structure from human-human interactions in limited domains.

Contact: Alex Rudnicky


TeamTalk

Dialogue systems mostly involve one human talking to one machine. What about interaction between the members of a human-robot team? The project focuses on two specific research issues: The management of multi-participant dialogues (touching on issues such as turn-taking) and the development of grounding strategies that allow humans and robots to agree on mutually-understandable descriptions of objects and actions in the context of a treasure hunt.

Contact: Alex Rudnicky


Sphinx

The Sphinx project is an umbrella for research in basic speech technologies. Current activities include real-time adaptive speech recognition in meetings, the exploration of ensemble techniques (creating and using multiple decoders to improve recognition accuracy) and techniques for real-time recognition. Sphinx recognition also supports research in meeting understanding and related activities. The Sphinx recognition code-base is open-source and is used by a number of projects in LTI, elsewhere in the University as well as by a large number of other sites.

Contact: Alex Rudnicky


SPICE
Speech Processing Interactive Creation and Evaluation Toolkit

Speech technology potentially allows everyone to participate in today's information revolution and can bridge the language barrier gap. Unfortunately, construction of speech processing systems requires significant resources. With some 4500-6000 languages in the world, traditionally speech processing is prohibitive to all but the most economically viable languages. In spite of recent improvements in speech processing, supporting new languages is a skilled job requiring significant effort from trained individuals. This project aims to overcome both limitations by providing innovative methods and tools for naive users to develop speech processing models, collect appropriate data to build these models, and evaluate the results allowing iterative improvements. By integrating speech recognition and synthesis technologies into an interactive language creation and evaluation toolkit usable by unskilled users, speech system generation will be revolutionized. Data and components for new languages will become available to everybody improving the mutual understanding and the educational and cultural exchange between the U.S. and other countries.

Contact: Tanja Schultz and Alan W Black


TransTac
Spoken Language Communication and Translation System for Tactical Use

In this program we develop technologies that enable robust spontaneous two-way tactical speech communications between American warfighters and native speakers. In this context we are investigating issues surrounding the rapid deployment of new languages, especially, low-resource languages and colloquial dialects. Currently we are working on two-way translation between English and Arabic Iraqi. TransTac builds on our existing speech translation technology.

Contact: Tanja Schultz and Alan W Black


SUBLIME
Speech and Language Based Information Management Environment

Project SUBLIME develops and tests a speech- and language-based interfaces for information management. We seek to develop cognitively palatable methods for people to access, add to, and modify both personal and public information spaces. Whereas most existing speech interfaces provide wide functionality in a given narrow domain, a SUBLIME interface seeks to provide a relatively narrow functionality (information management) in unrestricted domains.

Contact: Roni Rosenfeld


SLT4D: Speech and Language Technologies for International Development

In underserved communities around the world, spoken language systems are potentially more natural, cheaper to deploy/maintain/upgrade, and place less requirements on the user (such as literacy) than traditional PC/GUI-based systems, while still offering valuable services such as information access and education. Our first project in this domain, in collaboration with Aga Khan University (Karachi, Pakistan), focuses on creating a speech user interface for accessing health information resources by community health care workers in Pakistan. We are investigating both audio-only & multi-modal interfaces, and aim to continously ground the research empirically through user studies with the target population. Through this research, we hope to better understand the levels of literacy (if any) that speech-based information access interfaces are compelling.

Contact: Roni Rosenfeld


Speech Graffiti
Universal Speech Interfaces - USI

Human-machine speech based communication, especially for mobile speech applications and internet speech portals, is fast becoming a reality. Communication with such machines and information-servers does not require the full strength of natural language, nor should it have to cope with its ambiguities. What then is the ideal form of human-machine speech communication? Will there develop a particular style for talking to machines? If so, can we help this process along by developing principles for it? In the Universal Speech Interface (USI) project, we develop and test such principles. In essence, we are trying to do for speech communication what Grafitti(tm) has done for mobile text entry (see also The USI Manifesto).

Contact: Roni Rosenfeld


Information Retrieval Projects

The Information Retrieval projects include a wide range of issues related to finding, organizing, analyzing, and communicating information.


Adaptive Information Filtering

Automatically monitoring a stream of documents (e.g., news stories, news groups, etc) to find just those stories that are interesting to you. Learning from example what kinds of documents you find interesting.

Contact: Jamie Callan and Yiming Yang


Distributed Information Retrieval / Federated Search

Hundreds of thousands of specialized search engines are available on the Internet, but the contents of many are hidden from general purpose search engines such as Google. This "hidden" Web is estimated to be at least as large as the more traditional "visible" Web. Distributed Information Retrieval (now often called Federated Search) systems provide a single point of access for documents that are in different formats, in different languages, in different types of search engines, and controlled by other people. This research area also covers large, peer-to-peer networks of heterogeneous digital libraries.

Contact: Jamie Callan


Email Classification and Prioritization

Automatically assigning messages to user-defined folders (classes) based on content, importance, communication threads and users' organization strategies is a new challenge for machine learning. Statistical modeling of multi-type interconnected objects (users, messages, folders, keywords, etc.) is an important step towards the development of a truly useful email classification system.

Contact: Yiming Yang


eRulemaking
Text Mining Techniques for Large Public Comment Databases

Citizens and government administrators need a variety of navigation aids and text analysis tools to help them understand the contents of large public comment databases. These aids and tools include full-text search, automatic construction of browsing hierarchies, frequency analysis of discussion topics, and summarization of similar comments, as well as more complex analysis tools that identify stakeholder communities represented in a set of comments. The underlying technologies are primarily Information Retrieval, Text Datamining, and simple forms of Natural Language Processing.

Contact: Jamie Callan


Lemur Toolkit

Lemur is a collection of search engine algorithms and information retrieval applications used for IR research, development and education. Lemur provides a rich query language that supports search against simple texts, structured (XML) texts, and texts annotated with part-of-speech, named-entity, and other annotations used in NLP and text-mining applications. Lemur's search engines comfortably support collections ranging from a few gigabytes to a few terabytes of text. The software is distributed under open-source license, and is used widely in the IR research community.

Contact: Jamie Callan


Briefing Assistant

The BA project addresses the problem of creating customized summaries based on the preferences and information demands of humans report preparers. The BA is learning-based and models both the information selection and feature detection behavior of human summarizers. Current work centers on temporal summarization, creating narrative accounts of events that unfold over time.

Contact: Alex Rudnicky
JAVELIN
Open-Domain Question Answering

Typical IR systems return a set of documents, or perhaps a set of queries. LTI Question Answering software extracts information from documents in large, open-domain corpora to answer questions in subject areas that are not known in advance.

Contact: Eric Nyberg and Teruko Mitamura


The REAP Project
Reader-Specific Lexical Practice for Improved Reading Comprehension

The core ideas of the project are i) a search engine that finds text passages satisfying very specific lexical constraints, ii) selecting materials from an open-corpus (the Web), thus satisfying a wide range of student interests and classroom needs, and iii) the ability to model an individual's degree of acquisition and fluency for each word in a constantly-expanding lexicon so as to provide student-specific practice and remediation. This combination enables research on a wide range of reading comprehension topics that were formerly difficult to investigate.

Contacts: Jamie Callan and Maxine Eskenazi


Utility-based Information Distillation

We study supervised, unsupervised and semi-supervised learning techniques for automatically detecting novel events and tracking the new trends for relevant events from temporally-ordered documents, for dynamically updating user profiles under context, and for optimizing the utility of passage selection and summarization based on relevance, novelty, readability, readability and user cost (e.g., time). Collaborative and adaptive information filtering among multiple users is also a part of the open challenge.

Contacts: Yiming Yang and Jaime Carbonell


Also see: TagHelper 2.0.


Knowledge Acquisition Projects


Dark Matter
Knowledge Acquisition from Text

LTI is participating in Project Halo, a research effort to design and implement a "Digital Aristotle". Our focus is on the definition of KAL (Knowledge Acquisition Language), a form of controlled language that can be used to acquire domain knowledge from subject matter experts in domains such as Chemistry, Physics and Biology.

Contacts: Eric Nyberg and Teruko Mitamura


IAMTC
Interlingual Annotation of Multilingual Text Corpora

IAMTC is a multi-site NSF ITR project focusing on the annotation of six sizable bilingual parallel corpora for interlingual content with the goal of providing a significant data set for improving knowledge-based approaches to machine translation (MT) and a range of other Natural Language Processing (NLP) applications. The central goals of the project are: (1) to produce a practical, commonly-shared system for representing the information conveyed by a text, or interlingua (IL), (2) to develop a methodology for accurately and consistently assigning such representations to texts across languages and across annotators, (3) to annotate a sizable multilingual of parallel corpus of source language texts and translations for IL content.

Contacts: Lori Levin and Teruko Mitamura


Scone
Symbolic Knowledge Base

Scone is a symbolic knowledge representation system designed to run well on a standard workstation. Scone's primary design goals are ability to represent "common sense" knowledge, efficiency in performing inference and search, scalability to several million assertions, and ease of use.

Contact: Scott Fahlman


Also see: Tutalk and A Shared Resource for Robust Semantic Interpretation for Both Linguists and Non-Linguists.


Educational Applications


The Intelligent Writing Tutor (IWT)

The Intelligent Writing Tutor (IWT) project for ESL learners explores the issue of transfer and long-term retention of acquired knowledge, as part of the PSLC's underlying goal of developing a theory of robust learning. Through a series of learning experiments, we will look at both positive and negative transfer from a student's native language (L1) to English, the effects of an informed knowledge tracer on learning, and the role of level-appropriate feedback in achieving competency.

Contact: Teruko Mitamura


Calculategy

In this project we explore the impact of tutor strategy and example selection on student explanation behavior. The purpose is to identify strategies that make the most productive use of the time students spend with a tutorial dialogue system. We are collecting a corpus of tutoring dialogues in the calculus domain in which students discuss worked out examples, which may or may not contain an error in them, with a human tutor. The student reasons through the worked examples, identifying, explaining, and correcting errors. As part of this project we are experimenting with automatic approaches to corpus analysis, applying and extending approaches used previously for text classification, dialogue act tagging, and automatic essay grading.

Contact: Carolyn Rose


Cycle Talk

We are developing a novel style of tutorial dialogue system in which students discuss design choices as they are working on an optimisation problem in the field of thermodynamics. The purpose of CycleTalk is to engage students in negotiation dialogues over pros and cons of alternative choices. In this way we hope to elicit explanation behavior from students that is productive for learning. The primary language technology research foci of this project are dialogue management and robust language understanding.

Contact: Carolyn Rose


TagHelper 2.0
A Semi-Automatic Tool That Facilitates Reliable Content Analysis of Corpus Data

The goal of our research is to develop text classification technology to address concerns specific to classifying sentences using coding schemes developed for behavioral research. A wide range of behavioral researchers including social scientists, psychologists, learning scientists, and education researchers collect, code, and analyze large quantities of natural language corpus data as an important part of their research. A particular focus of our work is developing text classification technology that performs well on highly skewed data sets, which is an active area of machine learning research.

Contact: Carolyn Rose


A Shared Resource for Robust Semantic Interpretation for Both Linguists and Non-Linguists

The majority of existing authoring tools for constructing advanced conversational interfaces were designed for use by computational linguists. Our research goal is to explore strategies for supporting the development of language understanding interfaces by non-linguists. In our previous work we have developed Carmel-Tools, a behavior oriented authoring environment for building semantic knowledge sources for the CARMEL core understanding engine. In our recent work, we have begun conducting user studies that aim to better understand how people process a large amount of corpus data when faced with a task comparable to programming a dialogue agent using a data driven methodology. Our preliminary user study results hint that participants (1) introduce a bias when processing data sequentially (i.e. primacy effects) and (2) naturally represent semantic relatedness using spatial proximity. Based on these observations, we have developed the InfoMagnets interface that provides a physical metaphor for exploratory data analysis that is consistent with user conceptions of semantic relatedness and helps users avoid being biased by primacy effects by gaining a birds-eye view of their whole inventory of dialogue topics simultaneously.

Contact: Carolyn Rose


Learning-Oriented Dialogue in Cognitive Tutors
Towards a Scalable Solution to Performance Orientation

The purpose of the Learning Oriented Dialogue Project is to investigate the reasons behind unproductive patterns of student use of educational technology and to design interactions that will successfully bring about an improvement in student behavior. One aspect of this work focuses on developing a Peer Collaborative Agent to work with students as they solve math problems. The research literature investigating the construction of tutorial dialogue and learning companion environments present parallel experiences in attempting to emulate in technology what has been observed to be effective for learning in human-human scenarios. We argue that what is needed as a next step is a careful investigation using controlled experimentation to construct a causal model of how specific features of an agent’s behavior influence an individual student’s behavior and learning. A key aspect of our research agenda is to investigate previous claims about best practices in learning companion design that have not been subjected to rigorous evaluation. We do this using a particular experimental design methodology, which provides a highly controlled way to examine mechanisms by which one peer learner’s behavior influences a partner learner’s behavior and learning. Specifically, we make use of confederate peer learners who are experimenters acting as peer learners but behaving in a highly scripted way. This research has a strong empirical focus as well as a technology development focus.

Contact: Carolyn Rose


Tutalk
Infrastructure for authoring and experimenting with natural language dialogue in tutoring systems and learning research

The focus of our proposed work is to provide an infrastructure that will allow learning researchers to study dialogue in new ways and for educational technology researchers to quickly build dialogue based help systems for their tutoring systems. Most tutorial dialogue systems that to date have undergone successful evaluations (CIRCSIM, AutoTutor, WHY-Atlas, the Geometry Explanation Tutor) represent development efforts of many man-years. These systems were instrumental in pushing the technology forward and in proving that tutorial dialogue systems are feasible and useful in realistic educational contexts, although not always provably better on a pedagogical level than the more challenging alternatives to which they have been compared. We are now entering a new phase in which we as a research community must not only continue to improve the effectiveness of basic tutorial dialogue technology but also must find ways of accelerating both the process of investigating the effective use of dialogue as a learning intervention as well as development of usable tutorial dialogue systems. We propose to develop a community resource to address all three of these problems on a grand scale, building upon our prior work developing both basic dialogue technology and tools for rapid development of running dialogue systems.

Contact: Carolyn Rose


Facilitating Accountability for Standards-Based Math at All Levels

With its emphasis on high stakes testing upheld by the No Child Left Behind Act, the standards based education movement promises to encourage rigor in our nation’s public school education. We argue that what is needed is an efficient means of continuous but unobtrusive monitoring of student progress, consolidation of data, effective reporting, and instruction guided by strategic assessment data. However, teachers face a fundamental dilemma in trying to use assessment to guide instruction: assessment takes time away from instruction and teachers cannot be sure the time spent assessing will improve instruction enough to justify the cost of lost instructional time. We are addressing this dilemma by building and experimentally evaluating the effectiveness of a web-based "Assistment" system for middle school math in Massachusetts and Connecticut . On-line testing systems that grade students and provide reports reduce the demands on the teacher. However, they do not fundamentally address the assessment dilemma. In contrast to previous approaches, the Assistment system aims to 1) quickly predict student scores on standards-based tests, 2) provide timely feedback to teachers about how they can specifically adapt their instruction to address student knowledge gaps (while similarly providing reports to parents and administrators), and 3) provide an opportunity for students to get intelligent tutoring assistance as assessment data is being collected. Assistments provide more focused instruction than the feedback that is typically given by on-line multiple-choice systems.

Contact: Carolyn Rose


Also see: Fluency, The REAP project and Project Listen.

 


Computational Biology


Biological Language Modeling Project

Pattern recognition from protein sequences and automated mapping between sequences, folding structures and biological functions is a new line of research where we actively collaborate with biologists.

Contacts: Judith Klein-Seetharaman, Jaime Carbonell, Roni Rosenfeld, Yiming Yang and Raj Reddy


Statistical-Computational Models of Molecular Evolution

Molecular evolution is a stochastic computational process that has been running on massively parallel hardware for some 1017 seconds now, and which has resulted in many amazing local maxima along the way. The rapidly growing DNA and protein databases present a historic opportunity to model evolution at an unprecedented quantitative level, with enormous impact on medicine as well as on our fundamental understanding of life. In this project we combine statistical and computational methods to derive biological explanations and pharmacological predictions.

Contact: Roni Rosenfeld


Viruses, Vaccines, and Digital Life

Viruses are the simplest known self-replicating computational systems. They also happen to be the leading emerging threat to humanity in the 21st century. Fortunately, the new understanding of life in general and viruses in particular as digital programs opens the door to computational methods of defending against these threats. This is a new project launched in collaboration with leading virologists at the University of Pittsburgh whose aim is to combine biological analysis with statistical learning methods to better understand viral evolution and accelerate vaccine development.

Contact: Roni Rosenfeld


Other Projects


Informedia

The Informedia project tries to understand video, and enable search, visualization and summarization in both contemporaneous and archival content collections. The core technology combines speech, image and natural language understanding to automatically transcribe, segment and index linear video for intelligent search and image retrieval.

Contacts: Howard Wactlar and Alex Hauptmann


Project Listen
Literacy Innovation that Speech Technology ENables

Project LISTEN's Reading Tutor listens to children read aloud, and helps them learn to read. Project LISTEN offers exciting opportunities for interdisciplinary research in speech technologies, cognitive and motivational psychology, human-computer interaction, computational linguistics, artificial intelligence, machine learning, graphic design, and of course reading.

Contact: Jack Mostow


RADAR

Machine learning has been developed to the point where it can perform some truly useful tasks. However, much of the learning technology that's currently available requires extensive 'tuning' in order to work for any particular user, in the context of any particular task.

The focus of the RADAR project is to build a cognitive assistant that embodies machine learning technology that is able to function "in the wild" -- by this, we mean that the technology need not be tuned by experts, and that the person using the system that embodies the technology need not be trained in any special way.

Using the RADAR system itself, in the task for which it is designed, should be enough to allow RADAR to learn to improve performance.

RADAR is a joint project between SRI International and Carnegie Mellon University, and is funded by DARPA.

Contacts: Scott Fahlman and Jaime Carbonell

LTI related RADAR Components

Space-Time Planner: Contact: Eugene Fink
Knowledge Representation: Contact: Scott Fahlman
Briefing Assistant: Contact: Alex Rudnicky
NLP/email: Contact: Eric Nyberg
Summarization Contact: Alex Rudnicky

Argus

We are working on techniques for identification of both known and surprising patterns in massive databases, and on their application to security challenges. For example, we may use the developed techniques for identifying and tracking the spread of a new disease based on medical databases, or for detecting patterns of malicious activity in the network traffic. This work involves three main directions: efficient indexing of massive databases; real-time identification of patterns in a stream of newly incoming data; and search for surprising changes in data patterns.

Contacts: Jaime Carbonell and Eugene Fink


TalkBank

TalkBank is an interdisciplinary research project involving Carnegie Mellon University, the University of Pennsylvania, and 7 other secondary collaborators. The goal of TalkBank is to foster fundamental research in the study of human and animal communication. It has constructed sample databases of transcripts linked to audio and video within each of the 17 subfields studying communication. We are using these databases to advance the development of standards and tools for creating, sharing, searching, and commenting upon primary materials via networked computers.

Contact: Brian MacWhinney (macw@cmu.edu)


WebKB
The World Wide Knowledge Base Project

The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of this research project is to automatically create a computer understandable knowledge base whose content mirrors that of the World Wide Web. If successful, this would lead to much more effective retrieval of information from the web, the use of this information to support new knowledge based problem solvers. Our approach is to use machine learning algorithms to train the system to extract information of the desired types. Our web page describes the overall approach, plus several new algorithms we have developed that successfully extract information from the web.

Contact: Tom Mitchell


 

 

The LTI Webmaster
 



LTI is part of the School of Computer Science at Carnegie Mellon University.
This page is maintained by The LTI Webmaster.