July 29
Gideon Borensztajn and Stefan Frank
Institute for Logic, Language and Computation
University of Amsterdam
Computational Tools for Cognitive Research into Linguistic Phenomena
Our research group at the Institute for Logic, Language and Computation (ILLC)
at the University of Amsterdam uses techniques from computational linguistics
for psychological investigations in human language acquisition and
processing. In this talk, we present two examples of this research.
In the first part of the talk, Gideon Borensztajn presents his research that
shows children's grammars to grow more abstract with age. A method was
developed for automatic identification of the most probable multi-word
constructions used in children's utterances, given syntactically annotated
utterances from the Brown corpus of CHILDES (Sagae et al., 2007). The
constructions that were found cover many interesting linguistic phenomena from
the language acquisition literature, and show a progression from very concrete
towards abstract constructions. For all children of the Brown corpus,
grammatical abstraction, defined as the percentage of variable slots in the
productive units of their grammar, increases globally with age. This research
was presented at this year's Cognitive Science Conference.
In the second part of the talk, Stefan Frank discusses how the psychological
validity of sentence-processing models can be evaluated using so-called
"surprisal theory" (Hale, 2001; Levy, 2008). According to this theory, the
time required to read a word is inversely logarithmically related to the
word's probability of occurrence given its sentence context. Although such
probabilities can be estimated from large text corpora, the question remains
whether these "objective" probabilities resemble the "subjective"
probabilities assigned to words by readers. Surprisal theory can only be
falsified if we assume that subjective probabilities are indeed similar to
objective probabilities. If both this assumption and surprisal theory are
correct, an objectively more accurate probability model should also provide
more accurate predictions of word-reading times. A comparison of word
probabilities - as estimated by different models - and experimental
reading-time data indicates that such a relation does not necessarily hold. If
we nevertheless hold on to surprisal theory, these findings provide insight
into the nature of the human sentence-processing system.
BIOGRAPHICAL SKETCHES
Stefan Frank is a post-doctoral researcher in Rens Bod's research group at the
ILLC at the University of Amsterdam. He holds a PhD degree from Tilburg
University and the Max Planck Institute for Psycholinguistics in Nijmegen.
Between 2004 and 2007, he was a post-doctoral researcher at the Nijmegen
Institute for Cognition and Information, where he was awarded a 3-year
research grant on computational modeling of sentence comprehension. He has
been at ILLC since November 2007.
Gideon Borensztajn is currently a PhD student in Rens Bod's research group at
the ILLC at the University of Amsterdam. He received a Masters degree in
Cognitive Science from the University of Amsterdam in 2006. Prior to that,
between 1996 and 2004, he worked as a system and software developer at Camera
Obscura, School of the Arts in Tel Aviv, Israel.
|