Course Title: Machine Learning for Text Mining (11-747)
Department: Language Technologies Institute (LTI) [and CALD]
Units: 12
Semester: Fall
Instructor: Tom Mitchell, Jon Baxter, William Cohen, Andrew McCallum, Fernando Pereira
Prerequisites: a previous course in Machine Learning (eg, 15-681 or 15-781)
Course is cross-listed from CALD 10-683.
Course Description:
Extracting useful knowledge from large amounts of text and hypertext has become a topic of great interest, in part because of the huge volume of information that is now available on the web. This course will overview a variety of problems and the latest methods for text mining. We will consider machine learning approaches to problems such as document classification, information extraction, wrapper induction, reference matching and combining existing symbolic databases with other text databases. We will cover a variety of learning methods including nearest neighbor, Bayesian methods, hidden Markov models, active learning, and semi-supervised learning. The course format will include team-taught lectures, reading and discussing research papers, and projects.