Most approaches to automatic text analysis and processing treat text
as a stream of words or sentences. A typical underlying assumption is
that the use of language in the data is literal and that the data
represent facts. Many genres, however, do not have these features.
We are exploring automatic methods for analyzing text in the political
domain, specifically blog posts on topics pertinent to the 2008 United
States Presidential Elections. Political text is often indirect,
sarcastic, repetitive, hyperbolic, emotional, biased, manipulative,
and riddled with unstated assumptions. Our aim is to automatically
separate useful, thoughtful information from redundant "spin," using
statistical natural language processing techniques and a data-driven
methodology that makes use of the insights of political scientists.
The broader impact of this work will consist of a renewed emphasis
exploiting domain knowledge together with text data for more powerful
natural language understanding technology, as well as software tools
that will promote more informed decision-making among American voters.
In the News
CMU and LTI First To Use Yahoo!'s New Supercomputing Center
Yahoo! Inc is assisting research at
the LTI by providing access to a 4,000-processor supercomputer running
open-source distributed computing software such as Hadoop and the Pig
parallel programming language. The initial group of researchers using
the system include Jamie
Callan (information retrieval),
Noah Smith (natural language
processing), and Stephan
Vogel (machine translation). "We are excited about collaborating
with Yahoo! on systems software research, helping to advance the
state-of-the-art, and creating new research possibilities in this
critical area," said Randall E. Bryant, dean of the School of Computer
Science at Carnegie Mellon. For more information, see the Yahoo! press release.
Social Networking Project Emphasizes Compatible Minds
Incoming CMU freshmen will have the chance to try a new social networking site called Mindkin, developed by four SCS graduate students: Ulas Bardak, Betty Cheng, and Vasco Pedro of the LTI. and Jahanzeb Sherwani of the Computer Science Department. Bardak says he and the other students began working on Mindkin two years ago because existing sites seemed superficial, particularly in the emphasis given to photos.
Mindkin’s central feature is “Thought Stream,” a screen on which ideas submitted by users scroll by. A system of credits forces users to be selective in identifying ideas they like or dislike,which makes it impossible for someone to simply “like” all of the ideas scrolling through Thought Stream. If a user likes enough ideas from the same author, that author’s identity is eventually revealed so direct contact can be made.
The Mindkin braintrust has received a provisional patent on the concept and is looking for ways to commercialize it.
The Olympus Project has adopted it as a PROBE and will feature the social networking site at its next “Show and Tell” for venture capitalists on Sept. 25 in the Collaborative Innovation Center.