Data base of clusters
One of the main purposes of this project is to produce an interactive database of English word-final consonant clusters – phonologically speaking highly marked structures which are nevertheless abundant in English. Our aim is to enrich the scientific landscape of phonology with up to now unexplored diachronic aspects of cluster development, showing how coda clusters emerged and why they are historically speaking stable despite their marked status.
A preliminary version of the database (consisting of a list of more than 300,000 morphonotactically and phonologically analyzed tokens ending in two consonants) can now be downloaded here!
The corpora
The data for compiling the database will be derived from two corpora – the Penn-Helsinki Parsed Corpus of Middle English and the Penn-Helsinki Parsed Corpus of Early Modern English. These rather substantial diachronic corpora enable us to have a look at both pre- and post-schwa-loss data. A further advantage lies in the parsing of the corpora, which will facilitate the data analysis.
Philological determination of schwa loss
The process of schwa loss (i.e. reduction of unstressed vowels) can be seen as the driving force behind cluster production in English. However, there are considerable fluctuations regarding the exact time-span of schwa loss depending on regional/dialectal varieties, text genres etc. These variables are taken into account within the database.
Diffusion of schwa loss
Based on the gathered philological information and assumptions about linguistic diffusion, we use methods from network dynamics and mathematical epidemiology in order to obtain probabilistic estimates of schwa loss. One specialty of our database is its flexibility: researchers can enter their own estimates about when schwa loss was implemented, resulting in different waves of cluster emergence.
Simulating hypothetical language states
In order to detect ‘therapeutic’ actions such as cluster repair mechanisms or sound changes in general we need to know how English would have looked like if only schwa loss had occurred. Thus actual post-schwa-loss data (schwa loss + potential processes) will be compared to a hypothetical language state (only schwa loss) to evaluate differences in cluster frequency and cluster distribution.
Empirical testing of the SMH
The Strong Morphonotactic Hypothesis (SMH) asserts an interdependency of phonology and morphology and figures centrally in this project. It predicts for instance that homophone clusters in lexical and morphologically complex items will be treated differently (e.g. clusters are reduced only in lexical items), thereby decreasing their homophony and at the same time ambiguity.
Eco-evolutionary modelling of cluster dynamics
In order to test the SMH from a diachronic perspective we model the coupled dynamics of morpheme internal and morphonotactic clusters, for which we will use methods from mathematical ecology and evolutionary game theory. This will allow us to investigate the impact of analogical and semiotic effects on the development of cluster-token populations.