Corpora are an all-important resource in linguistics, as they constitute the primary source for large-scale examples of language usage. This has been even more evident in recent years, with the increasing availability of texts in digital format leading more and more corpus linguistics toward a “big data” approach. As a consequence, the quantitative methods adopted in the field are becoming more sophisticated and various. When it comes to morphology, corpora represent a primary source of evidence to describe morpheme usage, and in particular how often a particular morphological pattern is attested in a given language. There is hence a tight relation between corpus linguistics and the study of morphology and the lexicon. This relation, however, can be considered bi-directional. On the one hand, corpora are used as a source of evidence to develop metrics and train computational models of morphology: by means of corpus data it is possible to quantitatively characterize morphological notions such as productivity, and corpus data are fed to computational models to capture morphological phenomena at different levels of description. On the other hand, morphology has also been applied as an organization principle to corpora. Annotations of linguistic data often adopt morphological notions as guidelines. The resulting information, either obtained from human annotators or relying on automatic systems, makes corpora easier to analyze and more convenient to use in a number of applications.
Thomas W. Stewart
Segment-level alternations that realize morphological properties or that have other morphological significance stand either at an interface or along a continuum between phonology and morphology. The typical source for morphologically correlated sound alternations is the automatic phonology, interacting with discrete morphological operations such as affixation. Traditional morphophonology depends on the association of an alternation with a distinct concatenative marker, but the rise of stem changes that are in themselves morphological markers, be they inflectional or derivational, resides in the fading of phonetic motivation in the conditioning environment, and thus an increase in independence from historical phonological sources. The clearest cases are sole-exponent alternations, such as English man~men or slide~slid, but it is not necessary that the remainder of an earlier conditioning affix be entirely absent, only that synchronic conditioning is fully opaque. Once a sound-structural pattern escapes the unexceptional workings of a language's general phonological patterning, yet reliably serves a signifying function for one or more morphological properties, the morphological component of the grammar bears a primary if not sole responsibility for accounting for the pattern’s distribution. It is not uncommon for the transition of analysis into morphology from (morpho)phonology to be a fitful one. There is an established tendency for phonological theory to hold sway in matters of sound generally, even at the expense of challenging learnability through the introduction of remote representations, ad hoc triggering devices, or putative rules of phonology of very limited generality. On the morphological side, a bias in favor of separable morpheme-like units and syntax-like concatenative dynamics has relegated relations like stem alternations to the margins, no matter how regular, productive, or distinct from general phonological patterns in the language in question overall. This parallel focus of each component on a "specialization" as it were has left exactly morphologically significant stem alternations such as Germanic Ablaut and Celtic initial-consonant mutation poorly served. In both families, these robust sound patterns generally lack reliable synchronic phonological conditioning. Instead, one must crucially refer to grammatical structure and morphological properties in order to account for their distributions. It is no coincidence that such stem alternations look phonological, just as fossils resemble the forms of the organisms that left them. The work of morphology likewise does not depend on alternant segments sharing aspects of sound, but the salience of the system may benefit from perceptible coherence of form. One may observe what sound relations exist between stem alternants, but it is neither necessary nor realistic to oblige a speaker/learner to generate established stem alternations anew from remote underlying representations, as if the alternations were always still arising; to do so constitutes a grafting of the technique of internal reconstruction as a recapitulating simulation within the synchronic grammar.