Rob Malouf

Rob Malouf of San Diego State University will speak at the UCSD Linguistics Department Colloquium on January 31, 2011, at 2:00 pm in AP&M 4301.

Information and Evolution in Morphology

Cross-linguistically, inflectional morphology exhibits a spectacular range of complexity in both the structure of individual words and the organization of systems that words participate in. One approach to understanding the evident variety of this syntagmatic and pardigmatic complexity is to develop taxonomies of the attested strategies for organizing words and paradigms. One can identify common, uncommon, and unattested strategies, generating speculation about why what occurs occurs and why what doesn't doesn't.

In this talk, I will discuss the consequences of an alternative approach to morphological complexity which directly quantifies the difficulty that a system poses for language users (rather than lexicographers) in information theoretic terms. The average, or expected, entropy of a paradigm is the uncertainty in guessing the realization for a particular cell of the paradigm of a particular lexeme (given knowledge of the possible exponents). This gives one measure of the complexity of a morphological system -- systems with more exponents and more inflection classes will in general have higher expected entropy -- but it presupposes a problem that speakers will never encounter. In order to know that a lexeme exists, the speaker must have heard at least one wordform, so in the worst case a speaker will be faced with predicting a wordform based on knowledge of one other wordform of that lexeme. Thus a better measure of morphological complexity is the expected conditional entropy, the average uncertainty in guessing the realization of one randomly selected cell in the paradigm of a lexeme given the realization of one other randomly selected cell. Viewed from this information theoretic perspective, languages which appear to differ greatly in their morphological complexity -- the number of exponents, inflectional classes, and principal parts -- can actually be quite similar in terms of the challenge they pose for a language user who already knows how the system works.

This line of inquiry immediately poses two challenges: we need methods for measuring and comparing the entropy of paradigms, and we need to account for the mechanism by which languages come to have uniformly low paradigm entropy. This talk will describe a series of computational simulations aimed at addressing these two problems. The hypothesis is that speakers' need to produce previously unseed wordforms serves as a strong evolutionary pressure on language, which in turn leads morphological systems to develop in the direction of low paradigm entropy. To test this, I constructed a simple agent-based computational simulation, along the lines of Kirby's (2002) Iterated Learning model. Given a few basic assumptions, I trace the evolution of (simulated) languages as they are transmitted through successive generations of speakers and learners until a stable system is reached. Beginning with a language with very high paradigm entropy, much simpler systems develop quite quickly. While these systems are simple in the sense of having low paradigm entropy (typically very close to 0), they are generally complex in the ways that real morphological systems are. Thus, general constraints on language learning and use coupled with the view that language is a complex adaptive system provide an indirect explanation a typological regularity found across many languages.