Home


About

   Directions
   Admissions

   How To Apply

   The LTI Brochure


Education

   Ph.D.
   M.S.

   Undergrad Minor

   Courses

    FYI

LTI Forms

Seminars
   LTI Seminar Series
   Joint Speech Seminar (JSS)

   Machine Translation (MT)

   Student Research Symposium

   Information Retrieval Series


Visitor Information
   General
   Maps & Directions
   Hotel Links
   Parking Information


Research
   Projects

   Reports

    Dissertations


People

   Faculty

   Students

   Upcoming Graduates

   Staff

   Visitors   

   Who to See for What


Contacts


Course Title: Speech: Phonetics, prosody, perception and synthesis (11-752)

Department: Language Technologies Institute (LTI)
Units: 12
Semester: Spring
Instructor: Maxine Eskenazi, Alan W Black

Prerequisite: Knowledge of basic statistics, good computing skills. No prior experience with speech recognition is necessary. This course is primarily for graduate students in LTI, CS, Robotics, ECE, Psychology, or Computational Linguistics. Others by prior permission of instructor.

Course Description:

The goal of the course is to give the student basic knowledge from several fields that is necessary in order to pursue research in automatic speech processing. The course will begin with a study of the acoustic content of the speech signal. The students will use the spectrographic display to examine the signal and discover its variable properties. Phones in increasingly larger contexts will be studied with the goal of understanding coarticulation. Phonological rules will be studied as a contextual aid in understanding the spectrographic display.

The spectrogram will then serve as a first introduction to the basic elements of prosody. Other displays will then be used to study the three parts of prosody: amplitude, duration, and pitch. Building on these three elements, the student will then examine how the three interact in careful and spontaneous speech.

Next, the students will explore perception. Topics covered will be:

  • physical aspects of perception
  • psychological aspects of perception
  • testing perception processes
  • practical applications of knowledge about perception
  • The second part of this course will cover all aspects of speech synthesis.

    The whole synthesis process will be covered from both a theoretical and practical viewpoint. Subsections of the course will cover:

  • Synthesis in general
  • Text analysis
  • Lexicons and letter to sound rules
  • Prosodic modelling (intonation, duration etc).
  • Waveform generation (diphone and general unit selection)
  • Each section will describe the problems abstractly and cover what are considered current solutions with their advantages and disadvantages highlighted. Practical exercises cast within the Festival Speech Synthesis System framework will be set, to give experience in solving actual synthesis problems.

    Students need only have a basic knoweldge of speech and language processing. Some degree of programming and statistical modelling will be beneficial, but not required.

    Course notes:

    Course notes and exercise workpages will be provided although some reading of papers may also be required.

    Grading:

    Students will have practical exercises throughout the course as well as one larger project (possibly a group project).

    The LTI Webmaster
     



    LTI is part of the School of Computer Science at Carnegie Mellon University.
    This page is maintained by Stacey Young