Phonetic transcription represents the phonetic properties of an actual or potential utterance in a written form. Firstly, it is necessary to have an understanding of what the phonetic properties of speech are. It is the role of phonetic theory to provide that understanding by constructing a set of categories that can account for the phonetic structure of speech at both the segmental and suprasegmental levels; how far it does so is a measure of its adequacy as a theory. Secondly, a set of symbols is needed that stand for these categories. Also required is a set of conventions that tell the reader what the symbols stand for. A phonetic transcription, then, can be said to represent a piece of speech in terms of the categories denoted by the symbols. Machine-readable phonetic and prosodic notation systems can be implemented in electronic speech corpora, where multiple linguistic information tiers, such as text and phonetic transcriptions, are mapped to the speech signal. Such corpora are essential resources for automated speech recognition and speech synthesis.
Claire Brierley and Barry Heselwood
Corpus Phonology is an approach to phonology that places corpora at the center of phonological research. Some practitioners of corpus phonology see corpora as the only object of investigation; others use corpora alongside other available techniques (for instance, intuitions, psycholinguistic and neurolinguistic experimentation, laboratory phonology, the study of the acquisition of phonology or of language pathology, etc.). Whatever version of corpus phonology one advocates, corpora have become part and parcel of the modern research environment, and their construction and exploitation has been modified by the multidisciplinary advances made within various fields. Indeed, for the study of spoken usage, the term ‘corpus’ should nowadays only be applied to bodies of data meeting certain technical requirements, even if corpora of spoken usage are by no means new and coincide with the birth of recording techniques. It is therefore essential to understand what criteria must be met by a modern corpus (quality of recordings, diversity of speech situations, ethical guidelines, time-alignment with transcriptions and annotations, etc.) and what tools are available to researchers. Once these requirements are met, the way is open to varying and possibly conflicting uses of spoken corpora by phonological practitioners. A traditional stance in theoretical phonology sees the data as a degenerate version of a more abstract underlying system, but more and more researchers within various frameworks (e.g., usage-based approaches, exemplar models, stochastic Optimality Theory, sociophonetics) are constructing models that tightly bind phonological competence to language use, rely heavily on quantitative information, and attempt to account for intra-speaker and inter-speaker variation. This renders corpora essential to phonological research and not a mere adjunct to the phonological description of the languages of the world.