Corpora are an all-important resource in linguistics, as they constitute the primary source for large-scale examples of language usage. This has been even more evident in recent years, with the increasing availability of texts in digital format leading more and more corpus linguistics toward a “big data” approach. As a consequence, the quantitative methods adopted in the field are becoming more sophisticated and various. When it comes to morphology, corpora represent a primary source of evidence to describe morpheme usage, and in particular how often a particular morphological pattern is attested in a given language. There is hence a tight relation between corpus linguistics and the study of morphology and the lexicon. This relation, however, can be considered bi-directional. On the one hand, corpora are used as a source of evidence to develop metrics and train computational models of morphology: by means of corpus data it is possible to quantitatively characterize morphological notions such as productivity, and corpus data are fed to computational models to capture morphological phenomena at different levels of description. On the other hand, morphology has also been applied as an organization principle to corpora. Annotations of linguistic data often adopt morphological notions as guidelines. The resulting information, either obtained from human annotators or relying on automatic systems, makes corpora easier to analyze and more convenient to use in a number of applications.
Throughout the 20th century, structuralist and generative linguists have argued that the study of the language system (langue, competence) must be separated from the study of language use (parole, performance), but this view of language has been called into question by usage-based linguists who have argued that the structure and organization of a speaker’s linguistic knowledge is the product of language use or performance. On this account, language is seen as a dynamic system of fluid categories and flexible constraints that are constantly restructured and reorganized under the pressure of domain-general cognitive processes that are not only involved in the use of language but also in other cognitive phenomena such as vision and (joint) attention. The general goal of usage-based linguistics is to develop a framework for the analysis of the emergence of linguistic structure and meaning. In order to understand the dynamics of the language system, usage-based linguists study how languages evolve, both in history and language acquisition. One aspect that plays an important role in this approach is frequency of occurrence. As frequency strengthens the representation of linguistic elements in memory, it facilitates the activation and processing of words, categories, and constructions, which in turn can have long-lasting effects on the development and organization of the linguistic system. A second aspect that has been very prominent in the usage-based study of grammar concerns the relationship between lexical and structural knowledge. Since abstract representations of linguistic structure are derived from language users’ experience with concrete linguistic tokens, grammatical patterns are generally associated with particular lexical expressions.