The program consists of seven modules on speech and language technologies and three modules on this year's research projects. Each module will be covered in one day and will consist of two lectures and a lab session.
All lectures will be held in Newell Simon Hall, room 1305. All labs will be held in 300 S. Craig room 172.
Participation in the morning lectures is open to public but the spaces in the afternoon laboratories are limited. Anyone interested should contact Jae Cho and send their CV/Resume. Registration is CLOSED.
Monday, June 19, 2017 - Machine Learning (Host: Matt Gormley)
8:30 Continental Breakfast (Provided)
9:00 Welcome, Practicalities (Florian)
- Welcome
- Connect laptops to Wifi
9:15-10:45 Machine Learning (Speaker: Matt Gormley)
- What is Machine Learning?
- Supervised learning, classification, regression, ERM
- Linear regression, logistic regression
- Regularization
10:45-11:00 Break
11:00-12:30 Machine Learning (Speaker: Matt Gormley)
- Stochastic gradient descent
- Neural networks
- Backpropagation
12:30-1:30 Lunch (On your own)
1:30-2:00 Computer Setup Time (Florian Metze)
- Get access to cluster for exercises, verify login
2:00-2:45 Introduction to ML Lab (Speaker: Matt Gormley)
- Students will work in small teams (2-3 people)
- We will provide starter code for a simple supervised classifier and ask them to improve their accuracy via three possible changes to the code: better optimization, improvements to the model, or feature engineering
2:45-3:45 Snack Break (Provided)
3:45-5:30 ML Lab (Instructor: Matt Gormley)
TUESDAY, JUNE 20, 2017 - NATURAL LANGUAGE PROCESSING (HOST: ALAN BLACK)
8:30 Continental Breakfast (Provided)
9:00 - 10:30 NLP (Speaker: Alan Black)
10:30 - 10:45 Break
10:45 - 12:15 NLP (Speaker: Alan Black)
12:15 - 1:45 Lunch (On your own)
1:45 - 2:30 Introduction to NLP Lab (Speaker: Alan Black)
2:30 - 2:45 Snack Break (Provided)
2:45 - 5:00 NLP Lab (Instructor: Alan Black)
WEDNESDAY, JUNE 21, 2017 - DEEP LEARNING/REPRESENTATION LEARNING (HOST: RAMAN ARORA AND KEVIN DUH)
8:30 Continental Breakfast (Provided)
9:00 - 10:30 ML (Speaker: Raman Arora)
Introduction to representation learning
Multiview representation learning
10:30 - 10:45 Break
10:45 - 12:15 Representation Learning for Text (Speaker: Kevin Duh)
Word Representations: Neural language model, word2vec
Sentence Representations: LSTM, CNN, Attention
Representations from multiple views
12:15 - 1:45 Lunch (On your own)
1:45 - 2:30 Introduction to multiview representation learning lab (Speaker: Raman Arora)
2:30 - 2:45 Snack Break (Provided)
2:45 - 5:00 ML Lab (Instructor: Poorya Mianjy)
UPDATED: Thursday, June 22, 2017 - Low Resource Techniques in NLP (Host: Yulia Tsvetkov/David Mortensen)
8:30 Continental Breakfast (Provided)
9:00 - 10:30 Opportunities and Challenges in Working with Low-Resource Languages (Speaker: Yulia Tsvetkov)
State of the art in low-resource NLP
Why low-resource NLP is hard?
Social impact
An overview of approaches to low-resource NLP:
Unsupervised and semi-supervised learning
Cross-lingual transfer of resources and models
Joint resource-rich and resource-poor learning using language universals
10:30 - 10:45 Break
10:45 - 12:15 Case Studies in Cross-Lingual Knowledge Transfer from High- to Low-Resource Languages (Speaker: Yulia Tsvetkov)
Cross-lingual transfer of linguistic annotations via lexical correspondences
Cross-lingual bridging via transliteration, cognates, borrowing
Projection of features: syntactic features, semantic features, multilingual embeddings
Polyglot models: joint multilingual learning using universal linguistic knowledge
Case studies in language modeling, dependency parsing, and MT
11:45 - 12:15 Phonology and Low Resource NLP (Speaker: David Mortensen)
12:15 - 1:00 Lunch (On your own)
1:00 - 2:30 OIE Orientation for J1 Visa Holders
2:30 - 2:45 Snack Break (Provided)
2:45 - 5:00 Feature Induction in Low-Resource Settings (Tutorial and Lab) (Instructor: David Mortensen)
* Leveraging linguistic representations in low resource NLP
* Linguistic representations lab
Friday, June 23, 2017 - Social Media/ Dialog Processing (Host: Carolyn Rose)
8:30 Continental Breakfast (Provided)
9:00 - 10:30 Currency: Cultural Symbols in Language (Speaker: Carolyn Rose)
Linguistic Agency: Social Meaning as Arbitrary but not Random
Norms and Intertextuality
Synchronous vs Asynchronous Conversation and Speech vs Text
10:30 - 10:45 Break
10:45 - 12:15 Economy: Conversational Strategy (Speaker: Carolyn Rose)
Politeness Theory as an example of Conversational Strategy
Grice’s Maxims and Myers Scotton’s Markedness Model
Roles and Social Positioning
12:15 - 1:45 Lunch (On your own)
1:45 - 2:30 Wikipedia Discussion Analysis Lab (Speaker: Carolyn Rose)
English Wikipedia talk page corpus and Role Modeling work
Arabic Wikipedia talk page corpus and Codeswitching work
Prediction task: Predicting editor success in English Wikipedia and Arabic Wikipedia
2:30 - 2:45 Snack Break (Provided)
2:45 - 5:00 Wikipedia Discussion Analysis Lab (Instructor:Keith Maki and Michael Yoder)
Monday, June 26, 2017 - Machine Translation (Host: Philip Koehn)
8:30 Continental Breakfast (Provided)
9:00 - 10:30 Machine Translation (Speaker: Philipp Koehn)
A deeper look at deep learning: computation graphs, training
Implementation of deep learning toolkits
10:30 - 10:45 Break
10:45 - 12:15 Machine Translation (Speaker: Philipp Koehn)
Introduction to machine translation
Neural translation models
Current challenges
12:15 - 1:45 Lunch (On your own)
1:45 - 2:30 Introduction to NLP Lab (Speaker: NN)
2:30 - 2:45 Snack Break (Provided)
2:45 - 5:00 NLP Lab (Instructor: NN)
Tuesday, June 27, 2017 - Automatic Speech Recognition (Host: Florian Metze)
8:30 Continental Breakfast (Provided)
9:00 - 10:30 Speech-to-Text Basics (Speaker: Florian Metze)
Speech as a communication medium
Speech signal processing
Problem formulation and evaluation
Why is it hard - variability and robustness
Hidden Markov model approach
10:30 - 10:45 Break
10:45 - 12:15 Modern Speech Recognition (Speaker: Florian Metze)
Problems with HMM approach
End-to-end approaches with neural networks
12:15 - 1:45 Lunch (On your own)
1:45 - 3:30 Introduction to Speech Lab (Speaker: Florian Metze)
Kick off CTC acoustic model and RNN language model training in VM/ cluster
Encoder-decoder models
3:30 - 3:45 Snack Break (Provided)
3:45 - 5:00 Speech Lab (Instructor: Florian Metze)
Decode, evaluate, and compare results
Wednesday, June 28, 2017 - Neural Machine Translation (Host: Colin Cherry)
8:30 Continental Breakfast (Provided)
9:00 - 10:30 Monolingual Data in NMT (Speaker: Colin Cherry)
Introduction to “Neural Machine Translation with Minimal Parallel Resources”
Neural MT Refresher
Monolingual data to prime NMT components
Word vectors
NMT as a target language model
Initialization versus dual objectives
Creating bilingual data
Dictionary replacement
Back translation
Round-trip methods
10:30 - 10:45 Break
10:45 - 12:15 Syntax and Semantics in NMT (Speaker:Colin Cherry)
Syntactic Structures Refresher
Dynamic versus static networks
Survey of recent syntactic approaches in NMT
12:15 - 1:45 Lunch (On your own)
1:45 - 2:30 Introduction to the sockeye NMT framework (Speaker: Michael Denkowski)
NMT, mxnet and sockeye
How to train NMT models
Sockeye design and implementation details
2:30 - 2:45 Snack Break (Provided)
2:45 - 5:00 NMT Lab (Instructors: Daniel Beck, Gaurav Kumar and Vu Hoang)
Train complete NMT models on small data
Extend sockeye code to add new features
Thursday, June 29, 2017 - Enhancement & Analysis of Conversational Speech (Host: Ken Church)
8:30 Continental Breakfast (Provided)
9:00 - 10:30 CS (Speaker: Mark Liberman)
10:30 - 10:45 Break
10:45 - 12:15 CS (Speaker: Mark Liberman)
12:15 - 1:45 Lunch (On your own)
1:45 - 2:30 Introduction to CS Lab (Speaker: Mark Liberman)
2:30 - 2:45 Snack Break (Provided)
2:45 - 5:00 CS Lab (Instructor: NN)
Friday, June 30, 2017 - Rosetta Team Day (Host: Odette Scharenborg)
8:30 Continental Breakfast (Provided)
9:00 - 11:00 Introduction to Rosetta (Speaker: Odette Scharenborg)
Introduction to the project
What are linguistic units (+ a bit of articulatory phonetics)
Native and non-native human speech processing
11:00 - 11:15 Break
11:15 - 12:15 Introduction to speech2img (Speaker: Mark Hasegawa-Johnson)
Image retrieval from speech signals
Convolutional neural networks for speech and images
Tensorflow basics: Graph, Session, Variables
12:15 - 1:30 Lunch (On your own)
1:30 – 2:45 Lab speech2imag, part 1 (Instructor: Mark Hasegawa-Johnson; TA: Lucas Ondel and Liming Wang)
VGG16 neural network
Speech CNN neural network
Cosine similarity and dot product
Alternatives, e.g., one-hot encoding of acoustic unit discovery
2:45 – 4:00 Lab speech2imag, part 2 (Instructor: Mark Hasegawa-Johnson; TA: Lucas Ondel and Liming Wang)
VGG16 neural network
Speech CNN neural network
Cosine similarity
Alternatives, e.g., one-hot encoding of acoustic unit discovery
4:00 – 5:30 Wrap up (Florian Metze) and ICE CREAM SOCIAL