Lemur Search

LTI Course Listings

Welcome to the listing and information directory for all courses offered by the LTI. Courses are grouped in numerical order followed by summaries for each individual course below. Selecting a course number will take you directly to the appropriate listing for further information. 

This list includes several courses from outside of the LTI that are especially relevant to LTI students. Further information about these courses is available on the web pages of the departments that offer them. 

Depending on a student's interests, electives may be taken from the LTI, other departments within SCS, the Tepper School of Business, the Statistics department, or the University of Pittsburgh.

Undergraduate-Only Courses
Course Title Units Semester
11-344 Machine Learning in Practice 12 Fall
11-390 LTI Minor Project - Juniors 12 All
11-411 Natural Language Processing 12 Spring
11-441 Search Engines and Web Mining 12 Intermittent
11-442 Search Engines 12 Fall
11-443 Scalable Analytics 12 All
11-490 LTI Minor Project - Seniors 12 All
Graduate Courses
Course Title Units Semester
11-590 LTI Minor Project - Advanced 12 All
11-611 Natural Language Processing 12 Spring
11-641 Search Engines and Web Mining 12 Intermittent
11-642 Search Engines 12 Fall
11-643 Scalable Analytics 12 All
11-663 Machine Learning in Practice 12 Fall
11-641 Search Engines and Web Mining 12 Fall
11-683 Biotechnology Outsourcing Growth 6-Mini Spring
11-691 Software Planning & Management 6-Mini Spring
11-693 Software method for Biotechnology 6-Mini Fall
11-695 Competitive Engineering 12 Spring
11-696 MIIS Capstone Planning Seminar 6 Spring
11-697 MIIS Capstone Project 36 Fall
11-700 LTI Colloquium 6 All
11-711 Algorithms for NLP 12 Fall
11-712 Lab in NLP 6 Both
11-713 Advanced NLP Seminar 6 All
11-714 Tools for NLP 6 Fall
11-716 Graduate Seminar on Dialog Processing 6 All
11-717 Language Technologies for Computer Assisted Language Learning 12 Spring
11-718 Conversational Interfaces 12 Fall
11-719 Computational Models 12 Spring
11-721 Grammars and Lexicons 12 All
11-722 Grammar Formalisms 12 Intermittent
11-725 Meaning in Language 12 All
11-726 Meaning in Language Lab (Self-Paced) 6 Spring
11-731 Machine Translation 12 Spring
11-732 Self-Paced Lab: MT 6 All
11-733 Multilingual Speech-to-Speech Translation Lab 6 Fall
11-734 Advanced Machine Translation Seminar 6 Spring
11-736 Graduate Seminar on Endangered Languages 6 Fall
11-741 Information Retrieval 12 Spring
11-742 Self-Paced Lab: IR 6 All
11-744 Experimental Information Retrieval 12 Spring
11-745 Advanced Statistical Learning Seminar 12 Fall
11-748 Information Extraction 12 Spring
11-751 Speech Recognition and Understanding 12 All
11-752 Speech II: Phonetics, Prosody, Perception and Synthesis 12 Spring
11-753 Advanced Laboratory in Speech Recognition 6 Spring
11-754 Project Course: Dialogue Systems 6 All
11-755 Machine Learning for Signal Processing 12 All
11-756 Design and Implementation of Speech Recognition Systems 12 Spring
11-761 Language and Statistics 12 Spring
11-762 Language and Statistics II 12 Intermittent
11-763 Structured Prediction for Language and Other Discrete Data 12 Fall
11-765 Active Learning Seminar 6 Intermittent
11-772 Analysis of Social Media 12 Intermittent
11-773 Text-Driven Forecasting 12 Fall
11-780 Research Design and Writing 12 Fall
11-782 Self-Paced Lab for Computational Biology 6-12 All
11-783 Self-Paced Lab: Rich Interaction in Virtual World 6 Spring
11-791 Software Engineering for Information Systems 12 All
11-792 Intelligent Information Systems Project 12 Spring
11-794 Inventing Future Services 12 Intermittent
11-795 Seminar: Algorithms for Privacy and Security 6 Spring
11-796 Question Answering Lab 6 Spring
11-797 Question Answering 12 Spring
11-899 Summarization and Personal Information Management 12 Intermittent
11-910 Directed Research 1-48 All
11-920 Independent Study: Breadth 1-48 All
11-925 Independent Study: Area 1-48 All
11-928 Masters Thesis I 5-36 All
11-929 Masters Thesis II 5-36 All
11-930 Dissertation Research 1-48 All
11-935 LTI Practicum 1-36 All
Other SCS Courses
Course Title
10-601
10-701
Machine Learning (can only count under one focus per student) - LTI PhD students must register for 10-701 for it to count towards their required 8 courses. LTI Masters students should register for 10-601.
15-750 Algorithms
15-780 Artificial Intelligence
15-883 Computational Models of Neural Systems
LTI Course Directory
11-344 - Machine Learning in Practice
Description Machine Learning is concerned with computer programs that enable the behavior of a computer to be learned from examples or experience rather than dictated through rules written by hand. It has practical value in many application areas of computer science such as on-line communities and digital libraries. This class is meant to teach the practical side of machine learning for applications, such as mining newsgroup data or building adaptive user interfaces. The emphasis will be on learning the process of applying machine learning effectively to a variety of problems rather than emphasizing an understanding of the theory behind what makes machine learning work. This course does not assume any prior exposure to machine learning theory or practice. In the first 2/3 of the course, we will cover a wide range of learning algorithms that can be applied to a variety of problems. In particular, we will cover topics such as decision trees, rule-based classification, support vector machines, Bayesian networks, and clustering. In the final third of the class, we will go into more depth on one application area, namely the application of machine learning to problems involving text processing, such as information retrieval or text categorization.
11-390 - LTI Minor Project - Juniors
11-411 - Natural Language Processing
Description This course will introduce students to the highly interdisciplinary area of Artificial Intelligence known alternately as Natural Language Processing (NLP) and Computational Linguistics. The course aims to cover the techniques used today in software that does useful things with text in human languages like English and Chinese. Applications of NLP include automatic translation between languages, extraction and summarization of information in documents, question answering and dialog systems, and conversational agents. This course will focus on core representations and algorithms, with some time spent on real-world applications. Because modern NLP relies so heavily on Machine Learning, we'll cover the basics of discrete classification and probabilistic modeling as we go. Good computational linguists also know about Linguistics, so topics in linguistics (phonology, morphology, and syntax) will be covered when fitting. From a software engineering perspective, there will be an emphasis on rapid prototyping, a useful skill in many other areas of Computer Science. In particular, we will introduce some high-level languages (e.g., regular expressions and Dyna) and some scripting languages (e.g., Python and Perl) that can greatly simplify prototype implementation.
Pre-Requisites 15-211 Fundamental Data Structures and Algorithms
Course Site http://www.ark.cs.cmu.edu/NLP/
11-441 - Search Engines and Web Mining
Description This course provides a comprehensive introduction to the theory and implementation of algorithms for organizing and searching large text collections. The first half of the course studies text search engines for enterprise and Web environments; the open-source Indri search engine is used as a working example. The second half studies text mining techniques such as recommender systems, clustering, and categorization. Programming assignments give hands-on experience with document ranking, evaluation, categorizing documents into browsing hierarchies, and related topics.
11-442 - Search Engines
Description This course studies the theory, design, and implementation of text-based search engines. The core components include statistical characteristics of text, representation of information needs and documents, several important retrieval models, and experimental evaluation. The course also covers common elements of commercial search engines, for example, integration of diverse search engines into a single search service ("federated search", "vertical search"), personalized search results, diverse search results, and sponsored search. The software architecture components include design and implementation of large-scale, distributed search engines. 

This is a full-semester lecture-oriented course worth 12 units.

Eligibility This course is open to all students who meet the pre-requisites except students in the LTI's MLT and PhD programs. Students in the LTI's MLT and PhD programs can take 11-741, Information Retrieval, which focuses more on research. This course focuses more on current practice.
Pre-Requisites This course requires good programming skills and an understanding of computer architectures and operating systems (e.g., memory vs. disk trade-offs). A basic understanding of probability, statistics, and linear algebra is helpful. Thus students should have preparation comparable to the following CMU undergraduate courses.
  • 15-210, Parallel and Sequential Data Structures and Algorithms (required)
  • 15-213, Introduction to Computer Systems (required)
  • 15-451, Algorithm Design and Analysis (helpful)
  • 21-241, Matrix Algebra or 21-341, Linear Algebra (helpful)
  • 21-325, Probability (helpful)
  • 36-202, Basic statistics (helpful)
Website http://boston.lti.cs.cmu.edu/classes/11-642/
11-443 - Scalable Analytics
Description This is a full-semester lecture-oriented course (12 units), intended for students in professional master programs and undergraduates who meet the pre-requisites. Replacing the 2nd half of 11-641/11-441, Search Engines and Web Mining, this new course offers a blend of core theory, implementation and application of scalable data analytic techniques. Specifically, it covers high-dimensional data representation, dimensionality reduction, clustering, collaborative filtering, large scale classification, learning to rank, link analysis, temporal information distillation, and statistical significance tests. Homework assignments (6) give hands-on experiences to students by implementing representative algorithms, conducting empirical evaluations, and exercising the main concepts taught in the course.
Pre-Requisites
  • Data structures & algorithms (e.g. 15-213) (required)
  • Matrix or Linear Algebra (e.g. 21-241 or 21-341) (required)
  • Basic Probability and Statistics (e.g. 21-325) (required)
  • 15-451, Algorithm Design and Analysis (not required but helpful)
  • 10-601 or 10-701, Machine Learning (not required but helpful)

For CMU CS undergraduates, all of the required courses need to be completed before or during the junior year; for MS students, equivalent background is required.

Website http://nyc.lti.cs.cmu.edu/classes/11-643/
11-490 - LTI Minor Project - Seniors
11-590 - LTI Minor Project - Advanced
11-611 - Natural Language Processing
Description Natural language processing is an introductory graduate-level course on the computational properties of natural languages and the fundamental algorithms for processing natural languages. The course will provide an in-depth presentation of the major algorithms used in NLP, including Lexical, Morphological, Syntactic and Semantic analysis, with the primary focus on parsing algorithms and their analysis.
Pre-Requisites 15-211 Fundamental Data Structures and Algorithms
Course Site http://www.ark.cs.cmu.edu/NLP/
11-641 - Search Engines and Web Mining
Description This course provides a comprehensive introduction to the theory and implementation of algorithms for organizing and searching large text collections. The first half of the course studies text search engines for enterprise and Web environments; the open-source Indri search engine is used as a working example. The second half studies text mining techniques such as recommender systems, clustering, and categorization. Programming assignments give hands-on experience with document ranking, evaluation, categorizing documents into browsing hierarchies, and related topics.
11-642 - Search Engines
Description This course studies the theory, design, and implementation of text-based search engines. The core components include statistical characteristics of text, representation of information needs and documents, several important retrieval models, and experimental evaluation. The course also covers common elements of commercial search engines, for example, integration of diverse search engines into a single search service ("federated search", "vertical search"), personalized search results, diverse search results, and sponsored search. The software architecture components include design and implementation of large-scale, distributed search engines. 

This is a full-semester lecture-oriented course worth 12 units.

Eligibility This course is open to all students who meet the pre-requisites except students in the LTI's MLT and PhD programs. Students in the LTI's MLT and PhD programs can take 11-741, Information Retrieval, which focuses more on research. This course focuses more on current practice.
Pre-Requisites This course requires good programming skills and an understanding of computer architectures and operating systems (e.g., memory vs. disk trade-offs). A basic understanding of probability, statistics, and linear algebra is helpful. Thus students should have preparation comparable to the following CMU undergraduate courses.
  • 15-210, Parallel and Sequential Data Structures and Algorithms (required)
  • 15-213, Introduction to Computer Systems (required)
  • 15-451, Algorithm Design and Analysis (helpful)
  • 21-241, Matrix Algebra or 21-341, Linear Algebra (helpful)
  • 21-325, Probability (helpful)
  • 36-202, Basic statistics (helpful)
Website http://boston.lti.cs.cmu.edu/classes/11-642/
11-643 - Scalable Analytics
Description This is a full-semester lecture-oriented course (12 units), intended for students in professional master programs and undergraduates who meet the pre-requisites. Replacing the 2nd half of 11-641/11-441, Search Engines and Web Mining, this new course offers a blend of core theory, implementation and application of scalable data analytic techniques. Specifically, it covers high-dimensional data representation, dimensionality reduction, clustering, collaborative filtering, large scale classification, learning to rank, link analysis, temporal information distillation, and statistical significance tests. Homework assignments (6) give hands-on experiences to students by implementing representative algorithms, conducting empirical evaluations, and exercising the main concepts taught in the course.
Pre-Requisites
  • Data structures & algorithms (e.g. 15-213) (required)
  • Matrix or Linear Algebra (e.g. 21-241 or 21-341) (required)
  • Basic Probability and Statistics (e.g. 21-325) (required)
  • 15-451, Algorithm Design and Analysis (not required but helpful)
  • 10-601 or 10-701, Machine Learning (not required but helpful)

For CMU CS undergraduates, all of the required courses need to be completed before or during the junior year; for MS students, equivalent background is required.

Website http://nyc.lti.cs.cmu.edu/classes/11-643/
11-663 - Machine Learning in Practice
Description Machine Learning is concerned with computer programs that enable the behavior of a computer to be learned from examples or experience rather than dictated through rules written by hand. It has practical value in many application areas of computer science such as on-line communities and digital libraries. This class is meant to teach the practical side of machine learning for applications, such as mining newsgroup data or building adaptive user interfaces. The emphasis will be on learning the process of applying machine learning effectively to a variety of problems rather than emphasizing an understanding of the theory behind what makes machine learning work. This course does not assume any prior exposure to machine learning theory or practice. In the first 2/3 of the course, we will cover a wide range of learning algorithms that can be applied to a variety of problems. In particular, we will cover topics such as decision trees, rule based classification, support vector machines, Bayesian networks, and clustering. In the final third of the class, we will go into more depth on one application area, namely the application of machine learning to problems involving text processing, such as information retrieval or text categorization.
11-683 - Biotechnology Outsourcing Growth
Description An especially dangerous time for new ventures is right after the initial product launch. At startup, many ventures run lean with a small headcount and minimal operational overhead. After some success, the startup is compelled to expand headcount, increase capital expansion, and scale up operations. In many cases, what was a promising theoretical business model may fail due to inadequate growth management. Biotechnology companies in particular are increasingly having key functions outsourced to reduce cost and increasing efficiency. The capital cost for laboratories and specialized lab technicians is often prohibitive for biotech startups with a clear and narrow focus. Biotech startups are therefore running much leaner but with a distributed organizational structure. Under these circumstances, managing outsourced functions becomes critical and is a focus of this course. This course will introduce students to issues with growth strategy and outsourcing management.
11-691 - Software Planning & Management
Description There is a familiar picture regarding software development: it is often delivered late, over-budget, and lacking important features. There is often an inability to capture the customer's actual way of accomplishing work, and then creating a realistic project plan. This will be especially important as software development in the life sciences involves creating applications that are relatively new to the industry. The course will introduce students to the "Balanced Framework" of project management process that assists biotechnology organizations in planning and managing software projects that support their product development. It provides the identification, structuring, evaluation and ongoing management of the software project that deliver the benefits expected from the organization's investments. It focuses on the delivery of business value being initiated by the project. It helps an organization answer the basic question "Are the things we are doing providing value to the business?" In this course, students will learn how to examine and explain customer processes and create requirements that reflect how work is actually done. Students will additionally create a software project plan that incorporates: problem framing; customer workflow, planning, project tracking, monitoring, and measurement.
11-693 - Software method for Biotechnology
Description Moore's law describes how processing power continues to be faster, better, and cheaper. It not only powered the computer industry forward, but it also is a key driver for propelling biotechnology. It is hard to imagine the world of biotechnology without the world of software. Moreover, the future will further underscore software's importance for enabling biotechnology innovations. This course is focusing on the relationship between biotechnology processes and information technology where students will be introduced to business process workflow modeling and how these concepts are applied in large organizations. Through this method, students will learn the key drivers behind information systems and how to identify organizational opportunities and leverage these to create disruptive models. Student will also learn to assess new technology sectors for unsolved problems and commercially viable solutions By taking this course, students will become conversant with the software technologies that can be applied to commercial life science problems in the present and future.
11-695 - Competitive Engineering
Description In the second core course, students will be tasked with building a software application prototype for a biotech/pharmaceutical firm. Students will be introduced to a particular firm (through one of the program advisors) and will learn how to conduct and develop requirements analysis and convert that into feature definition. The customer requirements are often a moving target: they're influenced by the emergence of competitive alternatives (e.g. internal consultants, off-the-shelf software) and also by the team interaction with each others. Students will learn to create a product that best captures the best balance of the customer priorities and feasibility and distinguishing it from competitive alternatives. They will then use this learning to develop their respective prototypes. At the conclusion of the term, teams will compete with each other to determine which team's product is superior. In addition to having to apply various aspects of software development and computational learning, the course will help to provide students with some key insights into how biotech/pharmaceutical businesses operate. In addition to concepts regarding market demand, students will learn how to aggregate and synthesize information related to demand, pricing and competition. They will then apply this learning to define and prioritize market driven requirements as it relates to a product. This information will then be used to build a product development plan. Students will utilize methods to enhance product quality and customer satisfaction: benchmarking; industry and customer analyses; project metrics, and a range of customer relationship management tools.
11-696 - MIIS Capstone Planning Seminar
Description The MIIS Capstone Planning Seminar prepares students to complete the MIIS Capstone Project in the following semester.  Students are organized into teams that will work together to complete the capstone project.  They define project goals, requirements, success metrics, and deliverables; and they identify and acquire data, software, and other resources required for successful completion of the project. The planning seminar must be completed in the semester prior to taking the capstone project.
11-697 - MIIS Capstone Project
Description The capstone project is a large, group-oriented demonstration of student skill in one or more areas covered by the degree. Typically the result of the capstone project is a major software application. The capstone project is supervised by a member of the faculty who meets with students on a weekly basis to monitor progress and provide guidance.
11-700 - LTI Colloquium
Description The LTI colloquium is a series of talks related to language technologies. The topics include but are not restricted to Computational Linguistics, Machine Translation, Speech Recognition and Synthesis, Information Retrieval, Computational Biology, Machine Learning, Text Mining, Knowledge Representation, Computer-Assisted Language Learning and Intelligent Language Tutoring. To get credit of the course, students are required to write either a short critique of one of the presentations or a comparison of two.
Course Site http://www.cs.cmu.edu/afs/cs.cmu.edu/project/cmt-55/lti/Courses/700/2011/
11-711 - Algorithms for Natural Language Processing
Description Algorithms for NLP is an introductory graduate-level course on the computational properties of natural languages and the fundamental algorithms for processing natural languages. The course will provide an in-depth presentation of the major algorithms used in NLP, including Lexical, Morphological, Syntactic and Semantic analysis, with the primary focus on parsing algorithms and their analysis.
Topics Introduction to Formal Language Theory, Search Techniques, Morphological Processing and Lexical Analysis, Parsing Algorithms for Context-Free Languages, Unification-based Grammars and Parsers, Natural Language Generation, Introduction to Semantic Processing, Ambiguity Resolution Methods
Pre-Requisites College-level: course on algorithms/programming skills; Minimal exposure to syntax and structure of Natural Language (English)
Co-Requisites The self-paced Laboratory in NLP (11-712) is designed to complement this course with programming assignments on relevant topics. Students are encouraged to take the lab in parallel with the course or in the following semester.
Course Site http://demo.clab.cs.cmu.edu/fa2013-11711/index.php/Main_Page
11-712 - Lab in NLP
Description The Self-Paced Lab in NLP Algorithms is intended to complement the 11-711 lecture course by providing a chance for hands-on, in-depth exploration of various NLP paradigms. Students will study a set of on-line course materials and complete a set of programming assignments illustrating the concepts taught in the lecture course. Timing of individual assignments is left up to the student, although all assignments must be successfully completed and turned in before the end of the semester for the student to receive credit for the course.
Co-Requisites 11-711 - Algorithms for Natural Language Processing
11-713 - Advanced NLP Seminar
Description This course aims to improve participants' knowledge of current techniques, challenges, directions, and developments in all areas of NLP (i.e., across applications, symbolic formalisms, and approaches to the use of data and knowledge); to hone students' critical technical reading skills, oral presentation skills, and written communication skills; to generate discussion among students across research groups to inspire new research. 

In a typical semester, a set of readings will be selected (with student input) primarily from the past 2-3 years' conference proceedings (ACL and regional variants, EMNLP, and COLING), journals (CL, JNLE), and relevant collections and advanced texts. Earlier papers may be assigned as background reading. In 2010, the readings will primarily be recent dissertations in NLP. The format of each meeting will include a forty-minute, informal, critical student presentation on the week's readings, with presentations rotating among participants, followed by general discussion. Apart from the presentation and classroom participation, each student will individually write a 3-4-page white paper outlining a research proposal for new work extending research discussed in class - this is similar to the Advanced IR Seminar.

Course Site http://www.cs.cmu.edu/%7Enasmith/ANLPS/
11-714 - Tools for NLP
Description This course is designed as a hands-on lab to help students interested in NLP build their own compendium of the open-source tools and resources available online. Ideally taken in the first semester, the course focuses on one basic topic every two weeks, during which each student will download, install, and play with two or three packages, tools, or resources, and compare notes. The end-of-semester assignment will be to compose some of the tools into a system that does something interesting. We will cover a range, from the most basic tools for sentence splitting and punctuation removal through resources such as WordNet and the Penn Treebank to parsing and Information Extraction engines.
11-716 - Graduate Seminar on Dialog Processing
Description Dialog systems and processes are becoming an increasingly vital area of interest both in research and in practical applications. The purpose of this course will be to examine, in a structured way, the literature in this area as well as learn about ongoing work. The course will cover traditional approaches to the problem, as exemplified by the work of Grosz and Sidner, as well as more recent work in dialog, discourse and evaluation, including statistical approaches to problems in the field. We will select several papers on a particular topic to read each week. While everyone will do all readings, a presenter will be assigned to overview the paper and lead the discussion. On occasion, a researcher may be invited to present their own work in detail and discuss it with the group. A student or researcher taking part in the seminar will come away with a solid knowledge of classic work on dialog, as well as familiarity with ongoing trends.
11-717 - Language Technologies for Computer Assisted Language Learning
Description This course studies the design and implementation of CALL systems that use Language Technologies such as Speech Synthesis and Recognition, Machine Translation, and Information Retrieval. After a short history of CALL/LT, students will learn where language technologies (LT) can be used to aid in language learning. From there, the course will explore the specifics of designing software that must interface with a language technology, For each LT, we will explore: • what information does the LT require, • what type of output does the LT send to the CALL interface, • what are the limits of the LT that the CALL designer must deal with, • what are the real time constraints, • what type of training does the LT require The goal of the course is to familiarize the student with : • existing systems that use LT • assessment of CALL/LT software • the limitations imposed by the LT • designing CALL/LT software Grading criteria: • several short quizzes • term project: production of a small CALL/LT system, verbal presentation and written documentation of design of the software.
11-718 - Conversational Interfaces
Description Conversational Interfaces is intended to bring together an interdisciplinary mix of students from the language technologies institute and the human computer interaction institute to explore the topic of conversational interfaces from a user centered, human impact perspective rather than a heavily technology centered one. In this course we will explore through readings and project work such questions as (1) What are the costs and benefits to using a speech/language interface? (2) When is it advantageous to use a speech/language interface over an alternative? (3) What are the factors involved in the design of effective speech/language interfaces, and what impact do they have on the user's experience with the system? (4) How do we evaluate the usability of a speech/language interface? (5) What have we learned from evaluations of speech/language interfaces that have already been built? To what extent does the data support the claims that are made about the special merits of conversational interfaces?
11-719 - Computational Models
Description Discourse analysis is the area of linguistics that focuses on the structure of language above the clause level. It is interesting both in the complexity of structures that operate at that level and in the insights it offers about how personality, relationships, and community identification are revealed through patterns of language use. A resurgence of interest in topics related to modeling language at the discourse level is in evidence at recent language technologies conferences. This course is designed to help students get up to speed with foundational linguistic work in the area of discourse analysis, and to use these concepts to challenge the state-of-the-art in language technologies for problems that have a strong connection with those concepts, such as dialogue act tagging, sentiment analysis, and bias detection. This is meant to be a hands on and intensely interactive course with a heavy programming component. The course is structured around 3 week units, all but the first of which have a substantial programming assignment structured as a competition (although grades will not be assigned based on ranking within the competition, rather grades will be assigned based on demonstrated comprehension of course materials and methodology).
Course Site http://www.cs.cmu.edu/%7Ecprose/discourse-course.html
11-721 - Grammars and Lexicons
Description Grammars and Lexicons is an introductory graduate course on linguistic data analysis and theory, focusing on methodologies that are suitable for computational implementations. The course covers major syntactic and morphological phenomena in a variety of languages. The emphasis will be on examining both the diversity of linguistic structures and the constraints on variation across languages. Students will be expected to develop and defend analyses of data, capturing linguistic generalizations and making correct predictions within and across languages. The goal is for students to become familiar with the range of phenomena that occur in human languages so that they can generalize the insights into the design of computational systems. The theoretical framework for syntactic and lexical analysis will be Lexical Functional Grammar. Grades will be based on problem sets and take-home exams.
Pre-Requisites Introductory linguistics course or permission of instructor
11-722 - Grammar Formalisms
Description The goal of this course is to familiarize students with grammar formalisms that are commonly used for research in computational lingusitics, language technologies, and lingusitics. We hope to have students from a variety disciplines (linguistics, computer science, psychology, modern languages, philosophy) in order to cover a broad perspective in class discussions. Comparison of formalisms will lead to a deeper understanding of human language and natural language processing algorithms. The formalisms will include: Head Driven Phrase Structure Grammar, Lexical Functional Grammar, Tree Adjoining Grammar and Categorial Grammar. If time permits, we will cover Penn Treebank, dependency grammar, and Construction Grammar. We will cover the treatment of basic syntactic and semantic phenomena in each formalism, and will also discuss algorithms for parsing and generating sentences for each formalism. If time permits, we may discuss formal language theory and generative capacity.
11-725 - Meaning in Language
Description This course provides a survey of the many different ways in which meaning is conveyed in spoken languages, and of the different types of meaning which are conveyed. We will introduce various theoretical frameworks for the description of these phenomena. Topics to be covered will include: word meaning (lexical semantics); structure and meaning (compositional semantics); information structure (foregrounding and backgrounding); verb argument structure and thematic roles; intonational meaning and focus; presupposition; context dependency; discourse markers and utterance modifiers; and the role of inference in interpretation. The topics to be addressed bring together a variety of fields: linguistics; philosophy of language; communication studies and rhetoric; and language technologies. The course may be taken as either a 9-unit (80-306) or 12-unit (80-606/11-725) course. The 12-unit course will include an additional component, which will relate the content of the course to issues in computational linguistics, with an emphasis on methods of implementation. (The computational component will be taught by faculty from the Language Technologies Institute.)
11-726 - Meaning in Language Lab (Self-Paced)
Description The self-paced Meaning in Language Lab is intended to follow-up on the 11-725 lecture course (Meaning in Language) by providing a chance for hands-on, in-depth, computational exploration of various semantics and pragmatics research topics. The course is self-paced and there will be no scheduled lecture times, however, students are welcome to set up meetings with the instructor as desired, and students who prefer to have a weekly or bi-monthly regularly scheduled meeting with the instructor are welcome to arrange for that. If there is sufficient interest, an informal reading group may be formed to supplement the lab work. Students will design their own project, which they will discuss with the instructor for approval. Students are encouraged to select a topic from semantics, pragmatics, or discourse analysis, such as entailment, evidentiality, implicature, information status, or rhetorical structure, and a topic from language technologies, such as sentiment analysis or summarization, and explore how the linguistic topic applies to some aspect of the chosen language technology. Students are encouraged to contrast symbolic, formal, and knowledge based approaches with empirical approaches. Each student will work independently. If multiple students work as a team on a particular topic, each should choose an approach that is different from the approaches used by the other students working on the same problem. Students will be responsible to set up a web page, blog, or wiki to post progress reports and other supporting documents, data, and analyses. The web space will be checked by the instructor periodically , and thus should be kept updated in order to reflect on-going progress. The web space will also serve as a shared project space in the case that students are working in a team for the project.
11-731 - Machine Translation
Description Machine Translation is an introductory graduate-level course surveying history, techniques, and research topics in the field. The main objectives of the course are: Obtain a basic understanding of MT systems and MT-related issues. Learn about theory and approaches in Machine Translation. Learn about basic techniques for MT development, in preparation for the MT Lab course and real-world MT system project development. Obtain in-depth knowledge of one current topic in MT, or Perform an analysis of a given MT problem, matching it with the most suitable techniques (includes research, report and presentation).
Pre-Requisites 11-721 - Grammars and Lexicons or equivalent background is recommended. 
11-711 - Algorithms for NLP or equivalent background is recommended.
Course Site http://www-2.cs.cmu.edu/afs/cs/project/cmt-55/lti/Courses/731/www/
11-732 - Self-Paced Lab: MT
Description The Self-Paced Lab in MT is intended to complement the 11-731 lecture course by providing a chance for hands-on, in-depth exploration of various MT paradigms. MT faculty will present a set of possible topics to the students enrolled in the course. The students will indicate their first and second choices for lab projects, and will then be matched to a lab project advisor. At the end of the semester, the students will present the results of their projects in class, and submit a short paper describing them.
Pre-Requisites 11-731 - Machine Translation
11-733 - Multilingual Speech-to-Speech Translation Lab
Description Building speech-to-speech translation systems (S-2-S) is an extremely complex task, involving research in Automatic Speech Recognition (ASR), Machine Translation (MT), Natural Language Understanding (NLU), as well as Text-to-Speech (TTS) and doing this for many languages doesn't make it easier. Although substantial progress has been made in each of these areas over the last years, the integration of the invididual ASR, MT, NLU, and TTS components to build a good S-2-S system is still a very challenging task. The seminar course on Multilingual Speech-to-Speech Translation will cover important recent work in the areas of ASR, MT, NLU, and TTS with a special focus on language portable approaches and discuss solutions for rapidly building state-of-the-art speech-to-speech translation systems. In the beginning sessions the instructors and other invited lecturers will give a brief introduction into the broad field. We will select papers on particular topics to read by each week. While everyone will do all readings and participate in the discussions, one person is assigned per session to present the basic ideas of the topic specific papers and lead the concluding discussion.
11-734 - Advanced Machine Translation Seminar
Description The Advanced Machine Translation Seminar is a graduate-level seminar on current research topics in Machine Translation. The seminar will cover recent research on different approaches to Machine Translation (Statistical MT, Example-based MT, Interlingua and rule-based approaches, hybrid approaches, etc.). Related problems that are common to many of the various approaches will also be discussed, including the acquisition and construction of language resources for MT (translation lexicons, language models, etc.), methods for building large sentence-aligned bilingual corpora, automatic word alignment of sentence-parallel data, etc. The material covered will be mostly drawn from recent conference and journal publications on the topics of interest and will vary from year to year. The course will be run in a seminar format, where the students prepare presentations of selected research papers and lead in class discussion about the presented papers.
Pre-Requisites 11-731 - Machine Translation, or instructor approval.
11-736 - Graduate Seminar on Endangered Languages
Description The purpose of this seminar is to allow students to better understand the linguistic, social and political issues when working with language technologies for endangered languages. Often in LTI we concentrate on issues of modeling with small amounts of data, or designing optimal strategies for collecting data, but ignore many of wider practical issues that appear when working with endangered languages. This seminar will consist of reading books and papers, and having participants give presentations; a few invited talks (e.g. from field linguists, and language advocates) will also be included. It will count for 6 units of LTI course credit. It may be possible for interested students to also carry out a related 6-unit project as a lab.
Course Site http://www.cs.cmu.edu/%7Eref/sel/
11-741 - Information Retrieval
Description This course studies the theory, design, and implementation of text-based information systems. The Information Retrieval core components of the course include statistical characteristics of text, representation of information needs and documents, several important retrieval models (Boolean, vector space, probabilistic, inference net, language modeling), clustering algorithms, automatic text categorization, and experimental evaluation. The software architecture components include design and implementation of high-capacity text retrieval and text filtering systems. A variety of current research topics are also covered, including cross-lingual retrieval, document summarization, machine learning, topic detection and tracking, and multi-media retrieval.
Pre-Requisites Programming and data-structures at the level of 15-211 or higher; 
Algorithms comparable to the undergraduate CS algorithms course (15-451) or higher; 
Basic linear algebra (21-241 or 21-341); 
Basic statistics (36-202) or higher.
Course Site http://boston.lti.cs.cmu.edu/classes/11-741/
11-742 - Self-Paced Lab: IR
Description The Self-Paced Lab for Information Retrieval (IR Lab) is intended to complement the 11-741 lecture course (IR Core) by providing a chance for hands-on, in-depth exploration of various IR research topics. Students will design their own projects (project examples) and discuss instructor for approval. Each student will work independently. If multiple students work as a team on a particular topic, each should choose an approach that is different from the approaches used by the other students working on the same problem. Make a Web page for progress report and communication. Your Web page will be checked by the instructor periodically thus should be updated timely to reflect your on-going progress and work organization. The Web pages will also serve a role of data/tools sharing among students.
11-744 - Experimental Information Retrieval
Description This seminar studies the experimental evaluation of information retrieval systems in community-wide evaluation forums such as TREC, CLEF, NTCIR, INEX, TAC, and other annual research evaluations. The content will change from year to year, but the general format will be an in-depth introduction to the evaluation forum; its tracks or tasks, test collections, evaluation methodologies, and metrics; and several of the most competitive or interesting systems in each track or task. Class discussions will explore and develop new methods that might be expected to be competitive. The seminar includes a significant project component in which small teams develop systems intended to be competive with the best recent systems. Students are not required to participate in actual TREC, CLEF, etc., evaluations, however some students may wish to do so. A specific goal of the seminar is to prepare students to compete effectively in such evaluations. The course meets twice a week during the first half of the semester. This part of the course is a combination of seminar-style presentations and brainstorming sessions about how to build competitive systems. The course meets once a week during the second half of the semester, when students are doing their projects. This part of the class is essentially weekly progress reports about student projects.
Pre-Requisites 11-741 - Information Retrieval or consent of the instructor.
Course Site http://boston.lti.cs.cmu.edu/classes/11-744/
11-745 - Advanced Statistical Learning Seminar
Description This course emphasizes the theoretical foundation of statistical learning and its applications to many challenging problems. The objective is to enhance the understanding of statistical methods that graduate students learned from different courses, including Machine Learning (10-601 or 10-701) and Information Retrieval (11-741), and to integrate scattered pieces of knowledge into a more comprehensive formulation. For Fall, we choose the topics in the book "The Elements of Statistical Learning: Data Mining, Inference, and Prediction" by Trevor Hastie. Specifically, the topics include both supervised learning and unsupervised learning, various linear regression methods, linear classification methods, basic expansions and regularization, kernel methods, model assessment and selection, model inference and averaging, boosting and additive trees, neural networks, support vector machines, nearest-neighbor methods, and unsupervised clustering. Additional topics may include computational geometry applied to machine learning problems and other issues. The course will take the form of a seminar. We will go through the book, one chapter per class except that a heavy chapter may be split into two classes. Each class starts by collecting questions from all the participants about the current chapter, followed by a presentation (lecture) on that chapter, and then classroom discussions about collected and new questions. Students will be grouped into teams of two or three; each team is assigned two chapters that they will analyze, deliver a lecture and lead the classroom discussions. All the students are required to read every chapter before it is discussed in a class, and present their questions at the start of the class. Grading: There will be no exams or homework. The grading is based on class participation, quality of the seminar presentations delivered by each team, and questions submitted at the start of each class.
Pre-Requisites 11-741 - Information Retrieval; 
10-601 or 10-701 - Machine Learning or consent of the instructor.
11-748 - Information Extraction
Description Information extraction is finding names of entities in unstructured or partially structured text, and determining the relationships that hold between these entities. More succinctly, information extraction is the problem of deriving structured factual information from text. This course considers the problem of information extraction from a machine-learning prospective. We will survey a variety of learning methods that have been used for information extraction, including rule-learning, boosting, and sequential classification methods such as hidden Markov models, conditional random fields, and structured support vector machines. We will also look at experimental results from a number of specific information extraction domains, such as biomedical text, and discuss semi-supervised "bootstrapping" learning methods for information extraction. Readings will be based on research papers. Grades will be based on class participation, paper presentations, and a project. A rather out-of-date syllabus (not yet updated since spring 2007, last time the course was taught) is posted on the course site.
Pre-Requisites A machine learning course (e.g., 10-701, 10-601) or consent of the instructor.
11-751 - Speech Recognition and Understanding
Description The technology to allow humans to communicate by speech with machines or by which machines can understand when humans communicate with each other is rapidly maturing. This course provides an introduction to the theoretical tools as well as the experimental practice that has made the field what it is today. We will cover theoretical foundations, essential algorithms, major approaches, experimental strategies and current state-of-the-art systems and will introduce the participants to ongoing work in representation, algorithms and interface design. This course is suitable for graduate students with some background in computer science and electrical engineering, as well as for advanced undergraduates. Prerequisites: Sound mathematical background, knowledge of basic statistics, good computing skills. No prior experience with speech recognition is necessary. This course is primarily for graduate students in LTI, CS, Robotics, ECE, Psychology, or Computational Linguistics. Others by prior permission of instructor.
11-752 - Speech II: Phonetics, Prosody, Perception and Synthesis
Description The goal of the course is to give the student basic knowledge from several fields that is necessary in order to pursue research in automatic speech processing. The course will begin with a study of the acoustic content of the speech signal. The students will use the spectrographic display to examine the signal and discover its variable properties. Phones in increasingly larger contexts will be studied with the goal of understanding coarticulation. Phonological rules will be studied as a contextual aid in understanding the spectrographic display. The spectrogram will then serve as a first introduction to the basic elements of prosody. Other displays will then be used to study the three parts of prosody: amplitude, duration, and pitch. Building on these three elements, the student will then examine how the three interact in careful and spontaneous speech. Next, the students will explore perception. Topics covered will be: physical aspects of perception, psychological aspects of perception, testing perception processes, practical applications of knowledge about perception. The second part of this course will cover all aspects of speech synthesis. Students need only have a basic knoweldge of speech and language processing. Some degree of programming and statistical modelling will be beneficial, but not required. Taught every other year.
11-753 - Advanced Laboratory in Speech Recognition
Description The technology to allow humans to communicate by speech with machines or by which machines can understand when humans communicate with each other is rapidly maturing. While the 11-751 speech course focussed on an introduction to the theoretical foundations, essential algorithms, major approaches, and strategies for current state-of-the-art systems, the 11-753 speech lab complements the education by concentrating on the experimental practice in developing speech recognition and understanding speech-based systems, and by getting hands-on experience on relevant research questions using state-of-the art tools. Possible problem sets include both core speech recognition technology, and the integration of speech-based components into multi-modal, semantic, learning, or otherwise complex systems and interfaces.
11-754 - Project Course: Dialogue Systems
Description This course will teach participants how to implement a complete spoken language system while providing opportunities to explore research topics of interest in the context of a functioning system. The course will produce a complete implementation of a system to access and manipulate email through voice only, for example to allow users to interact with the mail system over a telephone while away from their computer. In doing so the class will address the component activities of spoken language system building. These include, but are not limited to, task analysis and language design, application-specific acoustic and language modeling, grammar design, task design, dialog management, language generation and synthesis. The course will place particular emphasis on issues in task design and dialog management and on issues in language generation and synthesis. For Fall, we will implement a simple telephone-based information access application. The domain is bus schedules (see http://www.speech.cs.cmu.edu/BusLine for a web-based interface to this domain) and the goal will be to create one or more usable applications that can provide a real service and can be deployed for actual use by the University community. Participants will chose individual components of the system to concentrate on and will collaborate to put together the entire system. It is perfectly acceptable for several individuals to concentrate on a single component, particularly if their work will exemplify alternative approaches to the same problem.
Pre-Requisites Speech Recognition or permission of the instructor.
11-755 - Machine Learning for Signal Processing
Description Signal Processing is the science that deals with extraction of information from signals of various kinds. This has two distinct aspects -- characterization and categorization. Traditionally, signal characterization has been performed with mathematically-driven transforms, while categorization and classification are achieved using statistical tools. 

Machine learning aims to design algorithms that learn about the state of the world directly from data. 

A increasingly popular trend has been to develop and apply machine learning techniques to both aspects of signal processing, often blurring the distinction between the two. 

This course discusses the use of machine learning techniques to process signals. We cover a variety of topics, from data driven approaches for characterization of signals such as audio including speech, images and video, and machine learning methods for a variety of speech and image processing problems.

11-756 - Design and Impletmentation of Speech Recognition Systems
Description Voice recognition systems invoke concepts from a variety of fields including speech production, algebra, probability and statistics, information theory, linguistics, and various aspects of computer science. Voice recognition has therefore largely been viewed as an advanced science, typically meant for students and researchers who possess the requisite background and motivation. In this course we take an alternative approach. We present voice recognition systems through the perspective of a novice. Beginning from the very simple problem of matching two strings, we present the algorithms and techniques as a series of intuitive and logical increments, until we arrive at a fully functional continuous speech recognition system. Following the philosophy that the best way to understand a topic is to work on it, the course will be project oriented, combining formal lectures with required hands-on work. Students will be required to work on a series of projects of increasing complexity. Each project will build on the previous project, such that the incremental complexity of projects will be minimal and eminently doable. At the end of the course, merely by completing the series of projects students would have built their own fully-functional speech recognition systems. Grading will be based on project completion and presentation.
Pre-Requisites Mandatory: Linear Algebra. Basic Probability Theory. 
Recommended: Signal Processing. 
Coding Skills: This course will require significant programming from the students. Students must be able to program fluently in at least one language (C, C++, Java, Python, LISP, Matlab are all acceptable).
Course Site http://www.cs.cmu.edu/afs/cs/user/bhiksha/WWW/courses/11-756.asr/spring2011/
11-761 - Language and Statistics
Description The goal of "Language and Statistics" is to ground the data-driven techniques used in language technologies in sound statistical methodology. We start by formulating various language technology problems in both an information theoretic framework (the source-channel paradigm) and a Bayesian framework (the Bayes classifier). We then discuss the statistical properties of words, sentences, documents and whole languages, and the various computational formalisms used to represent language. These discussions naturally lead to specific concepts in statistical estimation. 

Topics include: Zipf's distribution and type-token curves; point estimators, Maximum Likelihood estimation, bias and variance, sparseness, smoothing and clustering; interpolation, shrinkage, and backoff; entropy, cross entropy and mutual information; decision tree models applied to language; latent variable models and the EM algorithm; hidden Markov models; exponential models and the maximum entropy principle; semantic modeling and dimensionality reduction; probabilistic context-free grammars and syntactic language models.

Course Site http://www.cs.cmu.edu/%7Eroni/11761
11-762 - Language and Statistics II
Description This course will cover modern empirical methods in natural language processing. It is designed for language technologies students who want to understand statistical methodology in the language domain, and for machine learning students who want to know about current problems and solutions in text processing. Students will, upon completion, understand how statistical modeling and learning can be applied to text, be able to develop and apply new statistical models for problems in their own research, and be able to critically read papers from the major related conferences (EMNLP and .ACL). A recurring theme will be the tradeoffs between computational cost, mathematical elegance, and applicability to real problems. The course will be organized around methods, with concrete tasks introduced throughout. The course is designed for SCS graduate students. 

This course is taught intermittently. Students interested in this topic may also wish to consider 11-763 - Structured Prediction for Language and Other Discrete Data, which covers similar material.

Pre-Requisites Mandatory: 11-761 - Language and Statistics, or permission of the instructor. 
Recommended: 11-711 - Algorithms for Natural Language Processing; 10-601 or 10-701 - Machine Learning; or 11-745 - Advanced Statistical Learning Seminar
Course Site http://www.cs.cmu.edu/~nasmith/LS2/
11-763 - Structured Prediction for Language and other Discrete Data
Description This course seeks to cover statistical modeling techniques for discrete, structured data such as text. It brings together content previously covered in Language and Statistics 2 (11-762) and Information Extraction (10-707 and 11-748), and aims to define a canonical set of models and techniques applicable to problems in natural language processing, information extraction, and other application areas. Upon completion, students will have a broad understanding of machine learning techniques for structured outputs, will be able to develop appropriate algorithms for use in new research, and will be able to critically read related literature. The course is organized around methods, with example tasks introduced throughout.
Pre-Requisites 10-601 or 10-701 - Machine Learning or instructors' permission.
Course Site http://www.cs.cmu.edu/%7Enasmith/SPFLODD/
11-765 - Active Learning Seminar
Description Participants will read and present papers, including analyzing comparative strengths and weaknesses of various algorithms. Meetings will take place once a week for about two hours in the fall.
Pre-Requisites A graduate-level machine learning course.
11-772 - Analysis of Social Media
Description The most actively growing part of the web is "social media" (wikis, blogs, bboards, and collaboratively-developed community sites like Flikr and YouTube). This course will review selected papers from recent research literature that address the problem of analyzing and understanding social media. Topics to be covered include: 
-Text analysis techniques for sentiment analysis, analysis of figurative language, authorship attribution, and inference of demographic information about authors (age or sex). 
-Community analysis techniques for detecting communities, predicting authority, assessing influence (in viral marketing), or detecting spam. 
-Visualization techniques for understanding the interactions within and between communities. 
-Learning techniques for modeling and predicting trends in social media, or predicting other properties of media (user-provided content tags.)
Pre-Requisites 10-601 or 10-701 - Machine Learning or instructors' permission.
11-773 - Text-Driven Forecasting
Description Text-driven forecasting is an emerging collection of problems in which text documents or document collections are automatically analyzed to make specific, testable predictions about the future. Well-known examples include predictions about stock or market behavior, product sales patterns, government elections, legislative activities, or public opinion polls. While a research community focusing on these problems has yet to form, this course is based on the following observations: Forecasting provides a new driving force for research in natural language processing. What level of "understanding" is needed for predictions to be accurate? Forecasting is a unique machine learning problem involving discrete non-IID data, time series, and very natural evaluation against real-world events (i.e., did the model correctly predict what would happen today?). The rise of social media (and non-news text more generally) and their availability on the web, will inspire many new forecasting problems and datasets. Focusing on tangible real-world predictions will provide a nexus for computer scientists to come together with domain experts to reason about language use and how it should be modeled. Because people can never be expected to read all of the content relevant to a particular question about the future, intelligent text processing methods are may be the only way such content can be fully exploited. This twelve-credit seminar-project hybrid course aims to begin identifying challenge problems and testing some solutions to them.
Pre-Requisites Instructor's permission.
Course Site http://www.cs.cmu.edu/%7Enasmith/TDF/
11-780 - Research Design and Writing
Description In an increasingly competitive research community within a rapidly changing world, it is essential that our students formulate research agendas that are of enduring importance, with clean research designs that lead to generalizable knowledge, and with high likelihood of yielding results that will have impact in the world. However, even the best research, if not communicated well, will fail to earn the recognition that it deserves. Even more seriously, the most promising research agendas, if not argued in a convincing and clear manner, will fail to secure the funding that would give them the chance to produce those important results. Thus, in order to complement the strong content-focused curriculum in LTI, we are proposing a professional skills course that targets the research and writing methodology that our students will need to excel in the research community, both during their degree at LTI and in their career beyond. This course focuses specifically on general experimental design methodology and corresponding writing and reporting skills. Grades will be based on a series of substantial writing assignments in which students will apply principles from experimental design methodology, such as writing an IRB application, a research design, a literature review, and a conference paper with data analysis and interpretation. A final exam will test skills and concepts related to experimental design methodology, and will include short answer questions and a critique of a research paper.
11-782 - Self-Paced Lab for Computational Biology
Description Students will choose from a set of projects designed by the instructor. Students will also have the option of designing their own projects, subject to instructor approval. For the students who had completed a project in the 10-810 course, they can either switch to another project, or continue working on the previous project by aiming a significant progress (subject to instructor approval). Each student will work independently. If more than one student work on a particular topic, each should choose an approach that is different from the approaches used by the other students working on the same problem. The students need to begin with a project proposal to outline the high-level ideas, tasks, and goals of the problem, and plan of experiments and/or analysis. The instructor will consult with you on your ideas , but the final responsibility to define and execute an interesting piece of work is yours. 

Your project will have two final deliverables: 
1. a writeup in the form of a NIPS paper (8 pages maximum in NIPS format, including references), worth 60% of the project grade, and 
2. a research seminar presentation of your work at the end of the semester, worth 20% of the project grade. 

In addition, you must turn in a midway progress report (5 pages maximum in NIPS format, including references) describing the results of your first experiments, worth 20% of the project grade. Note that, as with any conference, the page limits are strict! Papers over the limit will not be considered. The grading of your project are based on overall scientific quality, novelty, writing, and clarity of presentation. We expect your final report to be of conference-paper quality, and you are expected to also deliver software implementation of your algorithmic results.

Pre-Requisites 10-810 - Advanced Algorithms and Model for Computational Biology
Co-Requisites 10-810 - Advanced Algorithms and Model for Computational Biology
11-783 - Self-Paced Lab: Rich Interaction in Virtual World
Description Massively Multi-player Online Role-Playing Games have evolved into Virtual Worlds (VWs), and are creating ever richer environments for experimentation on all aspects of human to human, or human to machine communication, as well as for information discovery and access. So far, interaction has been constrained by the limited capabilities of keyboards, joysticks, or computer mice. This creates an exciting opportunity for explorative research on speech input and output, speech-to-speech translation, or any aspect of language technology. Of particular interest will be a combination with other novel "real world" (RW) input, or output devices, such as mobile phones or portable games consoles, because they can be used to control the VW, or make it accessible everywhere in RW. Language technologies in particular profit from "context awareness", because domain adaptation can be performed. For scientific experimentation in that area, Virtual Worlds offer the opportunity to concentrate on algorithms, because context sensors can be written with a few lines of code, without the need for extra hardware sensors. Algorithms can also run "continuously", without the need for specific data collection times or places, because the VW is "always on". In this lab, we will enhance existing clients to virtual worlds so that they can connect to various speech and language related research systems developed at LTI and CMU's Silicon Valley campus. The lab will be held jointly at the CMU's Pittsburgh and Silicon Valley Campuses. We will "eat our own dog food", so the goal will be to hold the last session entirely in a virtual class room, which will by that time include speech control of virtual equipment, speech-to-speech translation, and some devices that can be controlled using non-PC type equipment, like mobile phones.
Pre-Requisites 11-751/18-781 - Speech Recognition and Understanding; 
18-799 - Special Topics in Signal Processing
11-791 - Software Engineering for Information Systems
Description The Software Engineering for IT sequence combines classroom material and assignments in the fundamentals of software engineering (11-791) with a self-paced, faculty-supervised directed project (11-792). The two courses cover all elements of project design, implementation, evaluation, and documentation. For students intending to complete both courses, it is recommended that the project design and proof-of-concept prototype be completed and approved by the faculty advisor before the start of 11-792, if possible. Students may elect to take only 11-791; however, if both parts are taken, they should be taken in proper sequence.
Course Site http://www.cs.cmu.edu/%7Eehn/seit.html
11-792 - Intelligent Information Systems Project
Description The Software Engineering for IS sequence combines classroom material and assignments in the fundamentals of software engineering (11-791) with a self-paced, faculty-supervised directed project (11-792). The two courses cover all elements of project design, implementation, evaluation, and documentation. Students may elect to take only 11-791; however, if both parts are taken, they should be taken in proper sequence. Prerequisite: 11-791. The course is required for VLIS students.
Pre-Requisites 11-791 - Software Engineering for Information Systems (required for VLIS students).
Course Site http://www.cs.cmu.edu/%7Eehn/seit.html
11-794 - Inventing Future Services
Description Inventing the Future of Services is a course that focuses on the development of innovative thinking in a business environment. CMU graduates should not be waiting for their employers to tell them what to do – they should be driving radical innovation in their businesses. Drawing on 17 years experience directing applied research at Accenture Technology Labs, the instructor teaches students systematic approaches to technology-driven business innovation in services industries.
Course Site http://www.cs.cmu.edu/~anatoleg/Inventing%20the%20Future%20of%20Services%20Course%20descr%20Fall%202011.htm
11-795 - Seminar: Algorithms for Privacy and Security
Description Alice wants an answer from Bob. But she does not want Bob to know the question! Charlie puts up pictures on the web. Bob downloads one of them from Flickr. How can he be sure the picture was Charlie's and not a counterfeit from Mallory? A secret must be distributed among N people so that a minimum of T of them must pool their knowledge in order to learn anything about the recipe? Answers to questions such as the above (many lie in a variety of computational fields such as Cryptography, Secure Multi-Party Computation, Watermarking, Secret Sharing. In this course we will cover a variety of topics related to privacy and security, including basic cryptography, secret sharing, privacy-preserving computation, data-hiding and steganography, and the latest algorithms for data mining with privacy. This will be a participatory course. Students will be required to present 1-3 papers during the semester. Papers must be analysed and presented in detail. Discussion and questions will be encouraged. Grading will be based on participation and presentation.
Pre-Requisites Recommended: Abstract Algebra, Number Theory.
Course Site http://www.cs.cmu.edu/afs/cs/user/bhiksha/WWW/courses/11-795.privacy/
11-796 - Question Answering Lab
Description The Question Answering Lab course provides a chance for hands-on, in-depth exploration of core algorithmic approaches to question answering (QA). Students will work independently or in small teams to extend or adapt existing QA modules and systems to improve overall performance on known QA datasets (e.g. TREC, CLEF, NTCIR, Jeopardy!), using best practices associated with the Open Advancement of Question Answering initiative. Projects will utilize existing components and systems from LTI (JAVELIN, Ephyra) and other open source projects (UIMA-AS, OAQA) running on a 10-node distributed computing cluster. Each student project will evaluate one or more component algorithms on a given QA dataset and produce a conference-style paper describing the experimental setup and results. Format: The course will require weekly in-class progress meetings with the instructors, in addition to individual self-paced work outside the classroom.
Pre-Requisites Intermediate Java programming skills.
11-797 - Question Answering
Description The Question Answering course provides a chance for hands-on, in-depth exploration of core algorithmic approaches to question answering (QA). Students will work independently or in small teams to extend or adapt existing QA modules and systems to improve overall performance on known QA datasets (e.g. TREC, CLEF, NTCIR), using best practices associated with the Open Advancement of Question Answering initiative. Each student project will evaluate one or more component algorithms on a given QA dataset and produce a conference-style paper describing the system design, experimental setup and results.
Pre-Requisites Intermediate Java programming skills.
11-899 - Summarization and Personal Information Management
Description The problem of information overload in personal communication media such as email, instant messaging, and on-line forums is a well documented phenomenon. Much work addressing this problem has been conducted separately in the human-computer interaction (HCI) community, the information sciences community, and the computational linguistics community. However, in each case, while important advancements in scientific knowledge have been achieved, the work suffers from an "elephant complex", where each community focuses mainly on just the part of the problem most visible from their own perspective. The purpose of this course is to bring these threads together to examine the issue of managing personal communication data from an integrated perspective.
11-910 - Directed Research
Description This course number documents the research being done by Masters and pre-proposal PhD students. Every LTI graduate student will register for at least 24 units of 11-910 each semester, unless they are ABD (i.e., they have had a thesis proposal accepted), in which case they should register for 48 units of 11-930. The student will be expected to write a report and give a presentation at the end of the semester, documenting the research done. The report will be filed by either the faculty member or the LTI graduate program administrator.
Pre-Requisites Consent of Instructor.
11-920 - Independent Study: Breadth
Description This course number is intended for individual study with faculty other than a student's intended thesis advisor.
Pre-Requisites Consent of advisor. Special Permission is required to register.
11-925 - Independent Study: Area
Description This course number is intended for individual study with the intended thesis advisor prior to acceptance of a student's thesis proposal.
Pre-Requisites Consent of advisor. Special Permission is required to register.
11-928 - Masters Thesis I
Description This course number is intended for last semester MLT students who wish to do an optional Masters Thesis. Please see the description of the optional Masters Thesis for more details.
Pre-Requisites Consent of advisor.
11-929 - Masters Thesis II
Description This course number is intended for last semester Masters students who wish to do an optional Masters Thesis. The student will normally have taken 11-925 - Independent Study: Area of Concentration for 12 units in the preceding semester, to produce an MS Thesis Proposal.
Pre-Requisites Consent of advisor.
11-930 - Dissertation Research
Description This course number is intended for PhD dissertation research after acceptance of a student's PhD thesis proposal.
Pre-Requisites Consent of advisor.
11-935 - LTI Practicum
Description This course is intended as an internship course for students who are doing Curricular Practical Training (CPT) as part of their graduate degree.