Carnegie Mellon University

Graham Neubig

Graham Neubig

Associate Professor, Language Technologies Institute

  • 5409 —Gates & Hillman Centers

Research Area

Machine Learning, Machine Translation, Natural Language Processing and Computational Linguistics, Spoken Interfaces and Dialogue Processing


Master of Science in Intelligent Information Systems


My research is concerned with language and its role in human communication. In particular, my long-term research goal is to break down barriers in human-human or human-machine communication through the development of natural language processing (NLP) technologies. This includes the development of technology for machine translation, which helps break down barriers in communication for people who speak different languages, and natural language understanding, which helps computers understand and respond to human language. Within this overall goal of breaking down barriers to human communication, I have focused on several aspects of language that both make it interesting as a scientific subject, and hold potential for the construction of practical systems.

Advances in core NLP technology: Human language is complex and nuanced, and handling this complexity in a computational way requires sophisticated algorithms or learning techniques, the development of which is far from a solved problem. The first major area of my work focuses on advances in core NLP technology, which improve the accuracy with which we can analyze, translate, generate, or reply to textual inputs. 

Models of language associated with other modalities: Human language does not exist in a vacuum, and is often associated with context from the world around it. The second thread of my research focuses on models of natural language associated with other modalities, tackling the problems, and exploiting the fascinating possibilities presented by having text associated with other types of information. Specifically, I have worked extensively with speech, and am also interested in associations between natural and formal languages such as source code.

Learning from naturally occurring data: Language data exists in abundance, particularly due to the advent of the internet, and it is possible to acquire extremely large amounts of this data for scientific or engineering purposes. A third thread of my research focuses on learning NLP models from naturally occurring data through the use of unsupervised, semi-supervised, or active learning, which allows us to exploit this data to improve models while reducing the necessity for costly manual annotation.