1-2 of 2 Results

  • Keywords: natural language processing x
Clear all

Article

A computational learner needs three things: Data to learn from, a class of representations to acquire, and a way to get from one to the other. Language acquisition is a very particular learning setting that can be defined in terms of the input (the child’s early linguistic experience) and the output (a grammar capable of generating a language very similar to the input). The input is infamously impoverished. As it relates to morphology, the vast majority of potential forms are never attested in the input, and those that are attested follow an extremely skewed frequency distribution. Learners nevertheless manage to acquire most details of their native morphologies after only a few years of input. That said, acquisition is not instantaneous nor is it error-free. Children do make mistakes, and they do so in predictable ways which provide insights into their grammars and learning processes. The most elucidating computational model of morphology learning from the perspective of a linguist is one that learns morphology like a child does, that is, on child-like input and along a child-like developmental path. This article focuses on clarifying those aspects of morphology acquisition that should go into such an elucidating a computational model. Section 1 describes the input with a focus on child-directed speech corpora and input sparsity. Section 2 discusses representations with focuses on productivity, developmental paths, and formal learnability. Section 3 surveys the range of learning tasks that guide research in computational linguistics and NLP with special focus on how they relate to the acquisition setting. The conclusion in Section 4 presents a summary of morphology acquisition as a learning problem with Table 4 highlighting the key takeaways of this article.

Article

The application of digital technologies within interdisciplinary environments is enabling the development of more efficient methods and techniques for analyzing historical corpora at scales that were not feasible before. The project “Digging into Early Colonial Mexico” is an example of cooperation among archaeologists, historians, computer scientists, and geographers engaged in designing and implementing methods for text mining and large-scale analysis of primary and secondary historical sources, specifically the automated identification of vital analytical concepts linked to locational references, revealing the spatial and geographic context of the historical narrative. As a case study, the project focuses on the Relaciones Geográficas de la Nueva España (Geographic Reports of New Spain, or RGs). This is a corpus of textual and pictographic documents produced in 1577–1585 ce that provides one of the most complete and extensive accounts of Mexico and Guatemala’s history and the social situation at the time. The research team is developing valuable digital tools and datasets, including (a) a comprehensive historical gazetteer containing thousands of georeferenced toponyms integrated within a geographical information system (GIS); (b) two digital versions of the RGs corpus, one fully annotated and ready for information extraction, and another suitable for further experimentation with algorithms of machine learning (ML), natural language processing (NLP), and corpus linguistics (CL) analyses; and (c) software tools that support a research method called geographical text analysis (GTA). GTA applies natural language processing based on deep learning algorithms for named entity recognition, disambiguation, and classification to enable the parsing of texts and the automatic mark-up of words referring to place names that are later associated with analytical concepts through a technique called geographic collocation analysis. By leveraging the benefits of the GTA methodology and resources, the research team is in the process of investigating questions related to the landscape and territorial transformations experienced during the colonization of Mexico, as well as the discovery of social, economic, political, and religious patterns in the way of life of Indigenous and Spanish communities of New Spain toward the last quarter of the 16th century. All datasets and research products will be released under an open-access license for the free use of scholars engaged in Latin American studies or interested in computational approaches to history.