The program consists of seven modules on speech and language technologies and three modules on this year's research projects. Each module will be covered in one day and will consist of two lectures and a lab session.

All lectures will be held in Newell Simon Hall, room 1305. All labs will be held in 300 S. Craig room 172. 

Participation in the morning lectures is open to public but the spaces in the afternoon laboratories are limited. Anyone interested should contact Jae Cho and send their CV/Resume. Registration is CLOSED.

Monday, June 19, 2017 - Machine Learning (Host: Matt Gormley)

8:30 Continental Breakfast (Provided)

9:00 Welcome, Practicalities (Florian)

  • Welcome
  • Connect laptops to Wifi

9:15-10:45 Machine Learning (Speaker: Matt Gormley)

  • What is Machine Learning?
  • Supervised learning, classification, regression, ERM
  • Linear regression, logistic regression
  • Regularization

10:45-11:00 Break

11:00-12:30 Machine Learning (Speaker: Matt Gormley)

  • Stochastic gradient descent
  • Neural networks
  • Backpropagation

12:30-1:30 Lunch (On your own)

1:30-2:00 Computer Setup Time (Florian Metze)

  • Get access to cluster for exercises, verify login

2:00-2:45 Introduction to ML Lab (Speaker: Matt Gormley)

  • Students will work in small teams (2-3 people)
  • We will provide starter code for a simple supervised classifier and ask them to improve their accuracy via three possible changes to the code: better optimization, improvements to the model, or feature engineering

2:45-3:45 Snack Break (Provided)

3:45-5:30 ML Lab (Instructor: Matt Gormley)

TUESDAY, JUNE 20, 2017 - NATURAL LANGUAGE PROCESSING (HOST: ALAN BLACK)

8:30 Continental Breakfast (Provided)

9:00 - 10:30 NLP (Speaker: Alan Black)

10:30 - 10:45 Break

10:45 - 12:15 NLP (Speaker: Alan Black)

12:15 - 1:45 Lunch (On your own)

1:45 - 2:30 Introduction to NLP Lab (Speaker: Alan Black)

2:30 - 2:45 Snack Break (Provided)

2:45 - 5:00 NLP Lab (Instructor: Alan Black)

WEDNESDAY, JUNE 21, 2017 - DEEP LEARNING/REPRESENTATION LEARNING (HOST: RAMAN ARORA AND KEVIN DUH)

8:30 Continental Breakfast (Provided)

9:00 - 10:30 ML (Speaker: Raman Arora)

Introduction to representation learning

Multiview representation learning

10:30 - 10:45 Break

10:45 - 12:15 Representation Learning for Text (Speaker: Kevin Duh)

Word Representations: Neural language model, word2vec

Sentence Representations: LSTM, CNN, Attention

Representations from multiple views

12:15 - 1:45 Lunch (On your own)

1:45 - 2:30 Introduction to multiview representation learning lab (Speaker: Raman Arora)

2:30 - 2:45 Snack Break (Provided)

2:45 - 5:00 ML Lab (Instructor: Poorya Mianjy)

UPDATED: Thursday, June 22, 2017 - Low Resource Techniques in NLP (Host: Yulia Tsvetkov/David Mortensen)

8:30 Continental Breakfast (Provided)

9:00 - 10:30 Opportunities and Challenges in Working with Low-Resource Languages (Speaker: Yulia Tsvetkov)

State of the art in low-resource NLP

Why low-resource NLP is hard? 

Social impact

An overview of approaches to low-resource NLP: 

Unsupervised and semi-supervised learning

Cross-lingual transfer of resources and models

Joint resource-rich and resource-poor learning using language universals

10:30 - 10:45 Break

10:45 - 12:15 Case Studies in Cross-Lingual Knowledge Transfer from High- to Low-Resource Languages (Speaker: Yulia Tsvetkov)

Cross-lingual transfer of linguistic annotations via lexical correspondences

Cross-lingual bridging via transliteration, cognates, borrowing

Projection of features: syntactic features, semantic features, multilingual embeddings

Polyglot models: joint multilingual learning using universal linguistic knowledge

Case studies in language modeling, dependency parsing, and MT

11:45 - 12:15 Phonology and Low Resource NLP (Speaker: David Mortensen)

12:15 - 1:00 Lunch (On your own)

1:00 - 2:30 OIE Orientation for J1 Visa Holders

2:30 - 2:45 Snack Break (Provided)

2:45 - 5:00 Feature Induction in Low-Resource Settings (Tutorial and Lab) (Instructor: David Mortensen)

  * Leveraging linguistic representations in low resource NLP

  * Linguistic representations lab

Friday, June 23, 2017 - Social Media/ Dialog Processing (Host: Carolyn Rose)

8:30 Continental Breakfast (Provided)

9:00 - 10:30 Currency: Cultural Symbols in Language (Speaker: Carolyn Rose)

Linguistic Agency: Social Meaning as Arbitrary but not Random

Norms and Intertextuality

Synchronous vs Asynchronous Conversation and Speech vs Text

10:30 - 10:45 Break

10:45 - 12:15 Economy: Conversational Strategy (Speaker: Carolyn Rose)

Politeness Theory as an example of Conversational Strategy

Grice’s Maxims and Myers Scotton’s Markedness Model

Roles and Social Positioning

12:15 - 1:45 Lunch (On your own)

1:45 - 2:30 Wikipedia Discussion Analysis Lab (Speaker: Carolyn Rose)

English Wikipedia talk page corpus and Role Modeling work

Arabic Wikipedia talk page corpus and Codeswitching work

Prediction task: Predicting editor success in English Wikipedia and Arabic Wikipedia

2:30 - 2:45 Snack Break (Provided)

2:45 - 5:00 Wikipedia Discussion Analysis Lab (Instructor:Keith Maki and Michael Yoder)

Monday, June 26, 2017 - Machine Translation (Host: Philip Koehn)

8:30 Continental Breakfast (Provided)

9:00 - 10:30 Machine Translation (Speaker: Philipp Koehn)

A deeper look at deep learning: computation graphs, training

Implementation of deep learning toolkits

10:30 - 10:45 Break

10:45 - 12:15 Machine Translation (Speaker: Philipp Koehn)

Introduction to machine translation

Neural translation models

Current challenges

12:15 - 1:45 Lunch (On your own)

1:45 - 2:30 Introduction to NLP Lab (Speaker: NN)

2:30 - 2:45 Snack Break (Provided)

2:45 - 5:00 NLP Lab (Instructor: NN)

Tuesday, June 27, 2017 - Automatic Speech Recognition (Host: Florian Metze)

8:30 Continental Breakfast (Provided)

9:00 - 10:30 Speech-to-Text Basics (Speaker: Florian Metze)

Speech as a communication medium

Speech signal processing

Problem formulation and evaluation

Why is it hard - variability and robustness

Hidden Markov model approach

10:30 - 10:45 Break

10:45 - 12:15 Modern Speech Recognition (Speaker: Florian Metze)

Problems with HMM approach

End-to-end approaches with neural networks

12:15 - 1:45 Lunch (On your own)

1:45 - 3:30 Introduction to Speech Lab (Speaker: Florian Metze)

Kick off CTC acoustic model and RNN language model training in VM/ cluster

Encoder-decoder models

3:30 - 3:45 Snack Break (Provided)

3:45 - 5:00 Speech Lab (Instructor: Florian Metze)

Decode, evaluate, and compare results

Wednesday, June 28, 2017 - Neural Machine Translation (Host: Colin Cherry)

8:30 Continental Breakfast (Provided)

9:00 - 10:30 Monolingual Data in NMT (Speaker: Colin Cherry)

Introduction to “Neural Machine Translation with Minimal Parallel Resources”

Neural MT Refresher

Monolingual data to prime NMT components

Word vectors

NMT as a target language model

Initialization versus dual objectives

Creating bilingual data

Dictionary replacement

Back translation

Round-trip methods

10:30 - 10:45 Break

10:45 - 12:15 Syntax and Semantics in NMT (Speaker:Colin Cherry)

Syntactic Structures Refresher

Dynamic versus static networks

Survey of recent syntactic approaches in NMT 

12:15 - 1:45 Lunch (On your own)

1:45 - 2:30 Introduction to the sockeye NMT framework (Speaker: Michael Denkowski)

NMT, mxnet and sockeye

How to train NMT models

Sockeye design and implementation details

2:30 - 2:45 Snack Break (Provided)

2:45 - 5:00 NMT Lab (Instructors: Daniel Beck, Gaurav Kumar and Vu Hoang)

Train complete NMT models on small data

Extend sockeye code to add new features

Thursday, June 29, 2017 - Enhancement & Analysis of Conversational Speech (Host: Ken Church)

8:30 Continental Breakfast (Provided)

9:00 - 10:30 CS (Speaker: Mark Liberman)

10:30 - 10:45 Break

10:45 - 12:15 CS (Speaker: Mark Liberman)

12:15 - 1:45 Lunch (On your own)

1:45 - 2:30 Introduction to CS Lab (Speaker: Mark Liberman)

2:30 - 2:45 Snack Break (Provided)

2:45 - 5:00 CS Lab (Instructor: NN)

Friday, June 30, 2017 - Rosetta Team Day (Host: Odette Scharenborg)

8:30 Continental Breakfast (Provided)

9:00 - 11:00 Introduction to Rosetta (Speaker: Odette Scharenborg)

Introduction to the project

What are linguistic units (+ a bit of articulatory phonetics)

Native and non-native human speech processing

11:00 - 11:15 Break

11:15 - 12:15 Introduction to speech2img (Speaker: Mark Hasegawa-Johnson)

Image retrieval from speech signals

Convolutional neural networks for speech and images

Tensorflow basics: Graph, Session, Variables

12:15 - 1:30 Lunch (On your own)

1:30 – 2:45 Lab speech2imag, part 1 (Instructor: Mark Hasegawa-Johnson; TA: Lucas Ondel and Liming Wang)

VGG16 neural network

Speech CNN neural network

Cosine similarity and dot product

Alternatives, e.g., one-hot encoding of acoustic unit discovery

2:45 – 4:00 Lab speech2imag, part 2 (Instructor: Mark Hasegawa-Johnson; TA: Lucas Ondel and Liming Wang)

VGG16 neural network

Speech CNN neural network

Cosine similarity

Alternatives, e.g., one-hot encoding of acoustic unit discovery

4:00 – 5:30 Wrap up (Florian Metze) and ICE CREAM SOCIAL