Computational semantics performs automatic meaning analysis of natural language. Research in computational semantics designs meaning representations and develops mechanisms for automatically assigning those representations and reasoning over them. Computational semantics is not a single monolithic task but consists of many subtasks, including word sense disambiguation, multi-word expression analysis, semantic role labeling, the construction of sentence semantic structure, coreference resolution, and the automatic induction of semantic information from data. The development of manually constructed resources has been vastly important in driving the field forward. Examples include WordNet, PropBank, FrameNet, VerbNet, and TimeBank. These resources specify the linguistic structures to be targeted in automatic analysis, and they provide high-quality human-generated data that can be used to train machine learning systems. Supervised machine learning based on manually constructed resources is a widely used technique. A second core strand has been the induction of lexical knowledge from text data. For example, words can be represented through the contexts in which they appear (called distributional vectors or embeddings), such that semantically similar words have similar representations. Or semantic relations between words can be inferred from patterns of words that link them. Wide-coverage semantic analysis always needs more data, both lexical knowledge and world knowledge, and automatic induction at least alleviates the problem. Compositionality is a third core theme: the systematic construction of structural meaning representations of larger expressions from the meaning representations of their parts. The representations typically use logics of varying expressivity, which makes them well suited to performing automatic inferences with theorem provers. Manual specification and automatic acquisition of knowledge are closely intertwined. Manually created resources are automatically extended or merged. The automatic induction of semantic information is guided and constrained by manually specified information, which is much more reliable. And for restricted domains, the construction of logical representations is learned from data. It is at the intersection of manual specification and machine learning that some of the current larger questions of computational semantics are located. For instance, should we build general-purpose semantic representations, or is lexical knowledge simply too domain-specific, and would we be better off learning task-specific representations every time? When performing inference, is it more beneficial to have the solid ground of a human-generated ontology, or is it better to reason directly with text snippets for more fine-grained and gradual inference? Do we obtain a better and deeper semantic analysis as we use better and deeper manually specified linguistic knowledge, or is the future in powerful learning paradigms that learn to carry out an entire task from natural language input and output alone, without pre-specified linguistic knowledge?
Lexical Acquisition and the Structure of the Mental Lexicon
Eve V. Clark
The words and word-parts children acquire at different stages offer insights into how the mental lexicon might be organized. Children first identify ‘words,’ recurring sequences of sounds, in the speech stream, attach some meaning to them, and, later, analyze such words further into parts, namely stems and affixes. These are the elements they store in memory in order to recognize them on subsequent occasions. They also serve as target models when children try to produce those words themselves. When they coin words, they make use of bare stems, combine certain stems with each other, and sometimes add affixes as well. The options they choose depend on how much they need to add to coin a new word, which familiar elements they can draw on, and how productive that option is in the language. Children’s uses of stems and affixes in coining new words also reveal that they must be relying on one representation in comprehension and a different representation in production. For comprehension, they need to store information about the acoustic properties of a word, taking into account different occasions, different speakers, and different dialects, not to mention second-language speakers. For production, they need to work out which articulatory plan to follow in order to reproduce the target word. And they take time to get their production of a word aligned with the representation they have stored for comprehension. In fact, there is a general asymmetry here, with comprehension being ahead of production for children, and also being far more extensive than production, for both children and adults. Finally, as children add more words to their repertoires, they organize and reorganize their vocabulary into semantic domains. In doing this, they make use of pragmatic directions from adults that help them link related words through a variety of semantic relations.