Digital Resources: LLILAS Benson Latin American Studies and Collections, University of Texas at Austin
- Kent NorsworthyKent NorsworthyDigital Scholarship Coordinator, University of Texas at Austin
LLILAS Benson maintains one of the world’s largest collections of digital assets designed to support Latin American studies. These vast digital holdings, all of which reside on open-source platforms and are freely available to a global audience via the Internet, trace their roots back to the early 1990s, before the advent of the World Wide Web. Since that time, LLILAS Benson has forged partnerships with a broad array of researchers and content producers throughout the Americas in order to bring vital Latin American studies content online while at the same time helping to build local capacity in areas such as digitization, metadata, and preservation throughout the region. These digital collections include materials useful to scholars in a broad array of disciplines, particularly in the humanities and social sciences.
One of the main strengths of the collections is in the area of archival and historical sources, with extensive digitized materials spanning more than five centuries and all countries in Latin America and the Caribbean. The digital collections are particularly strong in terms of Mexican history. Major holdings in the digital collections that include material of interest to those conducting historical research are the following:
• PLA—The Primeros Libros de las Américas project brings together twenty-one libraries and archives in a collaborative initiative that seeks to digitize all surviving copies of books printed in the New World prior to 1601.
• AHPN—The Archivo Histórico de la Policía Nacional contains more than twelve million pages of digitized Guatemalan police records from the late 19th century through 1996.
• AILLA—The Archive of the Indigenous Languages of Latin America is a digital archive of recordings and texts in and about the indigenous languages of Latin America.
• Archivo de Lucas Alamán is a digital archive of more than 350 manuscripts from the personal papers of this influential Mexican statesman. The papers cover the period 1589–1853.
• Archivo de José María Luis Mora—This digital archive contains scanned copies of more than 600 documents, both manuscripts and printed works from the first half of the 19th century, as well as an exhaustive guide describing the collections.
• LANIC—The Latin American Network Information Center is a collection of subject- and country-based resource guides containing more than ten thousand links to Web-based Latin American studies content.
• HRDI—The Human Rights Documentation Initiative is committed to the long-term preservation of fragile and vulnerable records of human rights struggles worldwide and includes important partnerships in Latin America.
• Web archives that are of use to historians include the Latin American Government Documents Archive, or LAGDA, which contains copies of the Websites of more than 250 governmental ministries since 2005, and a collection of human rights–related Websites curated under the auspices of the HRDI, among others.
Collectively, the LLILAS Benson portfolio of digital initiatives includes more than ten million pages of digitized archival records; several hundred thousand pages of digitized full text and images, including monographs, journals, scholarly papers, manuscripts, ephemera, and so on; thousands of hours of digital audio and video recordings; and more than a hundred million Web-archived files. The collection of curated resource guides for Latin American studies contains more than ten thousand outbound links. Taken as a whole, the Websites holding these digital assets generate more than three million pageviews per year. The vast majority of the digital holdings consist of unique items, thus filling an important void for scholarship left by mass digitization efforts, such as Google Books and the Internet Archive’s Million Books Project.
LLILAS Benson is committed to promoting open access to scholarly resources. In contrast to the unique digitized materials hosted by database vendors and aggregators, such as Gale’s “World Scholar Archive: Latin America and the Caribbean” or EBSCO’s “Academic Search Complete,” nearly all the digital content that LLILAS Benson hosts is on the open Internet, available to any and all users regardless of location or affiliation, and without any type of registration. The one exception is AILLA, where no-cost registration is required to open or download media files.
Origins and Development of the LLILAS Benson Digital Collections
The digital collections described in detail are today part of LLILAS Benson Latin American Studies and Collections at the University of Texas (UT) at Austin, though many of them trace their origins to an earlier period as part of either LLILAS or the Benson Collection. The LLILAS Benson partnership, which began in 2011, represents a new approach to globalized higher education. Libraries in major research universities are changing profoundly, propelled by the rise of scholarly digital resources, and buffeted by the prohibitive cost of conventional acquisitions. The field of Latin American studies has been changing for some time, requiring an end to the previous paradigm—benevolent study of our “southern neighbors” from an unreflectively northern perspective—and replacing it with the principles of horizontal collaboration among sister institutions across the hemisphere and critical theoretical engagement from a true diversity of perspectives, including those rooted in the south. Most important, this new approach must be capable of advancing an ethos that challenges parochialism, replacing it with a vigorous emphasis on research and education through firsthand experience in the region, hands-on engagement with the problems under study, and a deep commitment to scholarship for the common good.
The genesis of many of the LLILAS Benson digital collections lies in the two constituent elements that make up the partnership. The first is the Teresa Lozano Long Institute of Latin American Studies (LLILAS), which since 1940 has grown to become a vibrant center for the interdisciplinary study of Latin America and for the dissemination of this research and creative production through diverse means, from teaching to publications and digital archives to scholarly exchange. Our faculty—some 150 strong—teach courses on an astounding array of topics and train students for Bachelor of Arts, Master of Arts, and PhD degrees in Latin American Studies. In addition, these faculty train students in their respective disciplines in more than thirty academic departments across the university, melding a Latin American focus with their particular areas of expertise. Our faculty carry out research through hands-on engagement with the region—fieldwork, archival study, quantitative data collection, and other methods. LLILAS disseminates the results of this work in diverse media and forums, including conferences, research working groups, professional development seminars, and workshops with community partners, as well as film, digital, and print publications.
By means of a vibrant and nationally recognized public engagement program, we make this expertise available to broad audiences beyond the university, from schoolteachers to private-sector leaders to Latin America–born residents of Central Texas. In all activities we make a special effort to practice the UT principle—“What starts here changes the world”—addressing the hemisphere’s pressing problems at their roots and empowering successive generations of students to become agents of change in their own right. Students who leave UT with a degree in Latin American Studies gain a combination of hands-on and theoretical training from a multidisciplinary perspective, with appreciation for the historical depth and cultural context of the specific problem under study. Whether their career paths lead toward academia, the private sector, government, or civil society, we equip them with skills of critical analysis and a keen sense of ethical commitment to use these skills for the social good.
The second constituent element, the Nettie Lee Benson Latin American Collection, has since 1926 grown to become a mecca of scholarly resources for research, teaching, and public engagement related to Latin American and U.S. Latino/a populations. The Benson’s vast holdings of Latin American and Latino/a materials—second only to the Library of Congress—came of age under the leadership of the renowned historian and librarian Dr. Nettie Lee Benson, who established the bold principle that the collection should hold a copy of every published work of scholarly interest on Latin America in Spanish, Portuguese, or English. The result, many decades later, is a collection exceeding a million volumes of astounding scope and depth that attracts scholars from across the globe and provides a solid bibliographic foundation for nearly any Latin American topic of study.
The Rare Books and Manuscripts division is one of four UT Austin sites with extensive holdings of rare materials—from the “Primeros Libros” published in the Americas in 16th-century Mexico and Peru to the personal libraries and papers of significant figures in the contemporary history of Latin and Latino/a America. Access to these print and digital collections, combined with expert guidance from library staff, enables a steady stream of groundbreaking research on campus and globally, thanks to the digital resources offered as open source. The Benson also features an open door of access and assistance for students learning to use libraries for the first time, as well as for the community at large. Periodic public engagement events—exhibitions, lectures, cultural happenings—turn the library into a gathering place and send the message that the Benson’s larger purpose is to serve the communities whose histories we preserve.
Following is an in-depth look at four of the main LLILAS Benson digital collections that we consider to be of most use to historians; in addition, a list is provided that gives brief descriptions of a larger number of significant digital resources at LLILAS Benson.
International Collaborative Digitization: Primeros Libros de las Américas
Primeros Libros de las Américas: Impresos Americanos del Siglo XVI en las Bibliotecas del Mundo, or PLA, is a digital library initiative that brings together libraries, archives, and other cultural institutions in Mexico, the United States, Chile, Peru, and Spain.1 The project models collaborative international cooperation and showcases the potential of digital technologies in order to facilitate the preservation, open access, dissemination, and scholarship of historical and cultural patrimony, in this case as found in the earliest imprints from the New World.
As a corpus, the primeros libros were printed in Mexico and Peru between 1539 and 1601. Content coverage in the imprints covers a broad range of areas, including math, law, linguistics, literature, medicine, military science, music, navigation, philosophy, religion, and rhetoric. The corpus is also a valuable source for the study of languages, with volumes written in Huastec (Chuchona, Otomi, Zotzil), Latin, Mixtec, Nahuatl, Spanish, Tarascan (Michuacana, Purépecha), and Zapotec. In many cases exemplars are bilingual, and a few are even trilingual. Given more than four hundred years of use, these books typically show signs of wear, such as missing or worn pages, damaged bindings, warping, water and fire damage, wormholes, and pages trimmed into the text block from rebinding. Not surprisingly for books from this early period of printing, there are significant variations in printings, typography, engravings, and gatherings, as well as marginalia among multiple exemplars of the same work, thus creating rich and unique opportunities for bibliographic scholarship.
Of the 220 works thought to have been produced, approximately 135 survive in institutions around the world today. Ideally, the Primeros Libros project seeks at least one exemplar of each surviving title. But we also seek to digitize and provide access via the Web to as many duplicate copies of these works as possible. Such duplicate exemplars enable different lines of inquiry since marginalia, typographical variants and engravings, ex libris, and other copy-specific attributes are often critical for interpretation and other scholarly purposes.
The project Website hosted at the University of Texas holds digital copies of these 16th-century works in both JPG and PDF formats. The Website also includes ancillary materials, such as bibliographies, that provide background and context about the primeros libros and their publishers. Long-term digital preservation is one of the goals of the project. The Texas Digital Library’s (TDL) preservation network infrastructure stores the archival master files of each digitized exemplar, and the full array of archival and Web-ready files are archived in multiple locations around the world, including Mexico and Texas. As specified in the project’s Partner Agreement, any project members, even if they contribute only one book, are entitled to a copy of all of the digitized books in all of the archival and derivative formats, and they may choose to host a copy of the entire collection locally, so long as each partner’s books maintain their institutional branding.
Given the variety of possible uses of the digitized books, several file formats in different resolutions are produced for archival or Web use. Archival Masters are saved in both TIFF and JPF/JP2 file formats.2 As the Archival Master is considered the master file for preservation purposes, quality control is performed at this stage to ensure the accurate reproduction of the primer libro as a physical object. The Archival Master is duplicated—retaining the original scan’s specifications—but cropped to produce an intermediate file, termed the “Split.” The Split serves as the basis for the subsequent Web-ready files—JPG, JPF/JP2, and PDF—used in image viewers and thumbnails. These Web-ready files retain the 24-bit color of the Archival Masters, but the image resolution has been changed to 100 ppi from 400 ppi.
As a model of collaborative international cooperation, technical expertise and materials are shared among the project’s partner institutions, as the technical requirements of a digital project at this scale can be quite extensive. In order to lower the barriers to entry for new partner institutions, project members share technical resources that might otherwise preclude primeros libros holding institutions from participating in the project.
A “V-shaped” cradle scanner, an aerial scanner (for larger and/or fragile materials), and an archival scanner small enough to transport are housed at the Biblioteca Histórica José María Lafragua in Puebla, Mexico. Since the small scanner, which was purchased by Texas A&M University (TAMU), can be moved from location to location, primeros libros that cannot be sent to the production center at the Lafragua library for digitization are instead digitized on site at their home library, and the resulting digitized images are sent to Lafragua for postprocessing.
A bilingual project Website is hosted by the Benson Collection and the University of Texas Libraries. The site design process was a collaborative effort between the University of Texas, Texas A&M University, and Lafragua.
Operators and Supporters
The metadata schema was developed jointly by UT and TAMU, with substantial contributions from Lafragua and the Universidad de las Américas Puebla regarding technical and descriptive metadata.
Overall project coordination and administration, as well as derivative file production, are handled jointly by TAMU, Lafragua, and UT.
The Centro de Recursos Académicos Informáticos Virtuales (CREATIVA) at the Universidad Autónoma de San Luis Potosí in Mexico is responsible for the design and hosting of a digital repository that will serve as a preservation and distribution system for the array of Primeros Libros image files, both archival and Web ready, where project partners in Mexico can download the images for their use.
Primeros Libros partner staff at UT Austin, Texas A&M, and the Biblioteca Lafragua in Mexico continue to actively seek out new partner institutions for the project, and the number of books and partners that are part of the collaboration is continually expanding. As of September 2014, there were eleven partner institutions located in Mexico, six in the United States, two in Spain, and one each in Chile and in Peru. Collectively, these twenty-one institutions hold 349 exemplars of Primeros Libros.
Content and Coverage
To give a sense of the breadth of content within the Primeros Libros corpus, included here are sample pages from three different books. Figure 1 shows a page from Psalmodia cristiana y sermonario de los santos del año en lengua mexicana, published in 1583 and authored by the Franciscan missionary Bernardino de Sahagún.3 The Psalmodia is the first book of vernacular sacred song in the Americas. It contains 333 songs that Sahagún composed in Nahuatl as part of the broader Christian evangelization efforts in the New World. Figure 2 shows a page from the Opera Medicinalia, the first medical treatise published in the Americas in 1570 by Francisco Bravo. The Opera Medicinalia consists of four treatises, written in Latin, covering medical topics including epidemiology, archaic treatments, and medicinal herbs. The volume also features many engravings, including a rudimentary diagram of the human circulatory system. Figure 3 is a page from Antiphonarium, an imprint that consists of plainsong notes for use in liturgical service. The Antiphonarium, published in 1589, is printed in heavy Gothic characters.
Other types of texts published in 16th-century Mexico and Peru include glossaries, dictionaries, and legal texts, as well as works that cover such disciplines as medicine and surgery.
The Primeros Libros corpus has been used for many different types of research and scholarship in disciplines ranging from anthropology to linguistics, religious studies, philosophy, cultural studies, and the history of the book. Following are a few specific examples of published research that use exemplars from the corpus:
Another major goal of the Primeros Libros project is to serve as a platform to promote collaborative software development among project partners. As of 2016, there are two major initiatives under way in this regard. The first seeks to develop Optical Character Recognition (OCR) routines that can be used to transform the digitized images from these early print books, including those published in indigenous languages, into searchable text. The second is the development, led by Texas A&M University, of a comparative book reader named Cobre that seeks to extend the functionality of existing page viewers in ways that will enable new kinds of research with the Primeros Libros digital corpus. For example, Cobre’s comparison view allows the user to examine multiple exemplars of the same work simultaneously.
Collaborative Digital Collection Building: The Guatemalan National Police Historical Archive Project
LLILAS Benson is proud to be an essential player in the collaborative venture that in December 2011 resulted in the public launch of the digital archive of the Archivo Histórico de la Policía Nacional, or AHPN. The culmination of years of diligent work by archive staff in Guatemala, this unhindered access to the AHPN digital archive has opened up many new avenues of research into the country’s past by scholars, human rights activists, prosecutors, and family members of those killed or disappeared during Guatemala’s internal armed conflict, from 1960 to 1996.
The Police Archive was discovered fortuitously in 2005 by investigators looking for the source of explosions in an abandoned portion of a sprawling military base in Guatemala City. The existence of such an archive had long been denied by police, military, and civilian government officials, particularly during truth commission investigations in the 1990s. The investigators stumbled upon a series of rat- and cockroach-infested buildings with rooms piled floor to ceiling with immense bundles of moldy, rotting, and decaying documents. By the time AHPN archivists had concluded their calculations, nearly 8,000 linear meters of documents had been accounted for, in total more than 80 million folios of records stretching from 1882 when the police was founded to 1997 when the force was disbanded under the Guatemalan Peace Accords.
The archive contains many types of documents, including logbooks, identification cards, case files, photographs, memoranda, correspondence, and reports. It also includes loose files on kidnappings, murders, and assassinations created during nearly four decades of intense civil conflict beginning in the 1960s, a conflict that claimed 250,000 lives and displaced more than a million people.
After this initial discovery, the Human Rights Ombudsman office assumed custody of the archive under an order issued by the nation’s Civil Court. In 2009, responsibility for the AHPN was transferred to the Ministry of Culture, where it is under the direction of the Archivo General de Centro América (AGCA), Guatemala’s national archive. With its more than eighty million pages of documents, the AHPN represents the largest single repository of documents ever made available to human rights investigators. The AHPN is an integral part of the documentary patrimony of the Guatemalan people, with the mandate to ensure the preservation, safekeeping, and custody of the documentary record of the disbanded Guatemalan National Police in order to make this record accessible to the public at large.
Operators and Supporters
Working under very challenging conditions, the AHPN has built a professional archive that serves as an international example of development and implementation of best practices, policies, and procedures. The AHPN has worked closely with a broad array of leading archivists, as well as important national and international actors in the area of memory, human rights, and justice, including Dr. Trudy Huskamp Peterson, the Swiss Federal Archives, the Archivo General de Centroamérica, the Fundación de Antropología Forense, Benetech, Archiveros sin Fronteras, the University of Oregon, and the National Security Archive. The tireless efforts of a dedicated staff of more than a hundred at the AHPN in Guatemala, in conjunction with these institutional partnerships, has allowed the archive to move into uncharted territory in terms of preservation and access to these valuable historical records.
Following years of painstaking work to clean, identify, classify, organize, describe, and digitize the documents, in 2009 the AHPN opened a professionally staffed public reading room to provide access to the digitized documents to anyone able to visit the archive in Guatemala City in person. Reading room staff also accept requests for specific documents from prosecutors, human rights investigators, families of the disappeared, scholars, and journalists.
The Guatemalan Attorney General’s office has several staff members assigned full time to work at the archive researching ongoing cases of criminal human rights violations by police officials, primarily during the most intense period of the armed internal conflict in the 1970s and 1980s. Numerous such cases are working their way through the courts, and several convictions already have been obtained, some relying substantially on documentary evidence from the archive. Among these is the case of retired National Police Director Héctor Rafael Bol de la Cruz, arrested and charged in 2001 for his command role in the forced disappearance of labor leader Fernando García in 1984.
Content and Coverage
In December 2010, a delegation from the University of Texas at Austin met in Guatemala with AHPN officials, human rights groups, and Guatemalan scholars to explore areas of collaboration. In a subsequent Letter of Understanding between the AHPN and UT Austin—represented by the Bernard and Audre Rapoport Center for Human Rights and Justice, LLILAS Benson, and the University of Texas Libraries—the parties agreed to exchange technical expertise, cooperate in research, engage in capacity building through legal and academic networks, and organize an international academic conference in conjunction with these activities. As a cornerstone of this collaboration, UT Austin committed to the development of online access to the digital archive for public access to the AHPN records.
In April 2011, the UT Libraries received hard drives containing copies of millions of digital images scanned by the AHPN in Guatemala. UT Libraries partnered with UT’s Texas Advanced Computing Center (TACC) to create access derivatives from the initial set of master files. As digitization of the 80 million total physical pages proceeds in Guatemala, LLILAS Benson and UT Libraries will receive additional files for inclusion in the online archive.
With the December 2011 public launch of the digital archive, the AHPN and UT Austin took the bold and unprecedented step of putting the entirety of the digitized collection, totaling more than ten million pages of documents, online for universal access. In a departure from traditional practice, none of the records were screened, redacted, or access-restricted in any manner. Thus, an important part of the nation’s historical patrimony has been preserved and opened up for all citizens to consult as they work to discover and make sense of their own history. In the first seventy-two hours following the site launch at the conference, the digital archive received more than 10,000 pageviews. As of late 2014, total pageviews for the digital archive were close to half a million.
To underscore the collaboration among archivists, academics, and human rights activists, the digital archive was launched as part of a full-day conference, Politics of Memory: Guatemala’s National Police Archive, sponsored by the Rapoport Center and LLILAS Benson, and held at the UT School of Law. During her keynote address, Archivo General de Centro América Director Anna Carla Ericastilla summed up the significance of the conference: “Today’s event is important because it gives the Guatemalan people greater access to documents from an archive that has been instrumental in the processes of historical clarification and justice. This opportunity for Guatemalans to be able to examine the documents directly, without intermediation of any kind, contributes to their ability to form their own opinions and to take their understanding of what happened and corroborate it, juxtapose it, with other versions.”4
Since the scanned documents, many of which are handwritten, cannot be full-text searched, users of the digital archive must rely heavily on browsing the hierarchical structure of the archive to locate relevant records. As the archives profession has long recognized, item-level metadata is impractical, even impossible, for collections of this size and scope. The AHPN created an arrangement and description system that implements international standards and best practices, and the Benson Collection and UT Libraries were able to translate those standards and practices into an online environment to provide seamless access to the more than ten million and growing digitized records. The online archive reflects and maintains the archival principles of respect des fonds, provenance, and original order that AHPN staff so diligently excavated from the rotting piles of paper and painstakingly implemented. The online archive provides access through archival arrangement of record groups, record series, and subseries, as well as description in finding aids created and published using the General International Standard Archival Description, or ISAD(G) , standard. The goal of the Benson Collection and UT Libraries’ technical team was to recreate in the digital environment the experience of using the AHPN in person. The online archive recognizes the unique value of archival arrangement and description, and serves as a model for providing broad digital access to an extremely large historical collection with truly archival metadata.
The screenshot in Figure 4 demonstrates how users of the digital archive can browse archival hierarchy; note that each record group and series includes links to the finding aids with detailed descriptions and to the digitized documents that are part of that group.
The archive includes many different types of documents, from log books and photographs, to letters, memoranda, and informant reports. One of the most prevalent and useful types of documents are the registration cards from the Registro Maestro de Fichas, an internal National Police database that was used to track and cross-reference records associated with individual citizens and organizations. Reproduced in Figure 5 is one of the cards in the archive for Fernando García, a student and labor leader who was forcibly disappeared by government security forces in Guatemala City in 1984.5
The AHPN digital archive collaboration has resulted in a number of innovative initiatives both in the United States and in Guatemala involving scholars and human rights activists. For example, in the 2013 spring semester, Dr. Virginia Garrard-Burnett, a UT Austin history professor, taught the graduate seminar “Guatemalan History Through the National Police Historical Archive.”6 Students in the course conducted original research projects that relied heavily on source material they discovered in the digital archive.
The AHPN has generated an unprecedented collaboration among archivists, academics, and human rights activists in Guatemala, and its agreement with UT Austin has yielded an innovative plan for broadening these relations of collaboration. In the words of AHPN national coordinator Gustavo Meoño, “This alliance secures the perpetual public availability of the archive, which is so important for Guatemala. The University of Texas at Austin’s prestige and commitment to academic inquiry gives us an opportunity to guarantee the right to information in the most democratic and permanent manner possible.”7
Without moving the physical archives outside of the country, LLILAS Benson and the UT Libraries have made the collection universally accessible online. The activities contemplated in this partnership will improve upon and use that accessibility to anchor a world-class archival, research, and transitional justice community all directed toward greater understanding of the conflict in Guatemala and committed to preventing anything like it from happening again. In short, with its unique collaboration between entities in Guatemala and the United States, between academics and activists, and between digital library and archive experts in Austin and the relatives of victims of a genocidal war in the highlands of Guatemala, this project marries peace building to the digital 21st century. It encourages, across generations and geographical locations, the creation and animation of the intellectual capital essential for underpinning efforts for lasting peace—in Guatemala and elsewhere.
Born Digital Collection Development: Web Archiving and the LAGDA Project
Web archiving is a relatively new practice in the area of digital librarianship that seeks to gather, preserve and provide access to born digital content from the World Wide Web. Born digital refers to content that was originally produced in digital format, as opposed to having been converted from an analog format. While mandates vary widely from country to country, and among different types of memory institutions, preservation of Web content has as a matter of practice largely fallen to national libraries and archives, research libraries, state libraries, museums, and similar institutions whose mission includes stewardship of cultural resources.8
In an organizational sense, Web archiving has an important role to play both externally and internally. Externally, Web archive collections have a clear and demonstrated value vis-à-vis future research, akin to the role played by books and other artifacts of the cultural record in the predigital era. Like other forms of ephemeral content, Web resources have value as evidence about what was happening at the time they were created or modified. Such evidence can be of value not only for historical research but also for such activities as marketing or legal work. Internally, Web archiving can play a vital role in the internal records-management process of an organization.
Operators and Supporters
As an institutional practice, Web archiving began in 1996. In that year, Brewster Kahle launched the Internet Archive with the lofty goal of creating a universally accessible digital library. While the Internet Archive has diversified its service offerings significantly since then, preserving Web content remains at the core of its work. It was also in 1996 that the National Library of Australia and the National Library of Sweden launched what would become the first efforts by such institutions to capture part of their national domain, in this case the .au and .se domains. Today, many national libraries from around the world are engaged in such efforts.
In the United States, in the year 2000 the Library of Congress launched the Minerva Project, an ongoing program for collecting focused collections of Web content in different areas of U.S. history, politics, and culture. Minerva was eventually renamed the Library of Congress Web Archives.
Web archiving as a field reached a major milestone in 2003 with the birth of the International Internet Preservation Consortium (IIPC). The IIPC, which as of 2016 has forty-four members, primarily national and major research libraries from twenty-five countries, is dedicated to “improving the tools, standards and best practices of web archiving while promoting international collaboration and the broad access and use of web archives for research and cultural heritage.”9
One of the main software tools used for gathering content into a Web archive is the Internet Archive’s Heritrix crawler, first released in 2002. In 2009, the Heritrix file output format, the WARC file, was adopted as an ISO (International Organization for Standardization) standard for Web archiving. And in 2006, the Internet Archive launched its Archive-It Web archiving service, a subscription-based initiative that helps partner organizations to harvest, build, manage, and preserve born digital collections. As of July 2013, Archive-It had 225 partners in forty-five U.S. states and fifteen countries worldwide. To date, these partners have created more than two thousand public collections containing more than six billion URLs using Archive-It.10
In 2013, the Internet Archive and the Archive-It team released the Web Archiving Life Cycle Model as part of broader efforts aimed at establishing a set of best practices for Web archiving and at increasing awareness of the central role that Web archiving should play as part of broader digital preservation strategies. The goal of the model is to present what a typical Web archiving workflow might look like and to put forth a measurable model that organizations can use when they are building or upgrading their Web archiving programs.
The model is based on a series of steps and phases that together represent the process of Web archiving in a series of circular bands. The outermost band corresponds to policy, since as a practical matter nearly all aspects of Web archiving are based on a policy decision of one type or another. Another band corresponds to metadata and description, activities that, like policy, are generally ongoing and are present throughout the entirety of the life cycle. The model then brings together a series of steps that represent high-level decisions with which any group embarking on a Web archiving activity will need to grapple. These steps are vision and objectives, resources and workflow, access/use/reuse, preservation, and risk management. The model then breaks down the lower-level tasks that are part of a Web archiving program. These tasks are appraisal and selection, scoping, data capture, storage and organization, and quality assurance and analysis. Finally, at the very center of the model is the collection, which comprises the actual archived Web content. The data that comprise the collection are the end product of all the preceding activities and the actual focus of the preservation efforts.
Content and Coverage
To gain a deeper appreciation of how Web archiving works at LLILAS Benson, we look in depth at one initiative as a case study, the Latin American Government Documents Archive, or LAGDA.11 LAGDA, which is conducted as a partnership between the University of Texas Libraries and LLILAS Benson’s Latin American Network Information Center, seeks to apply Web archiving technology to a specific collecting area: born digital Latin American government documents. The LAGDA effort was the immediate product of a 2003 planning grant at LLILAS designed to explore methodologies for harvesting and preserving Web content. Since 2005, LAGDA has been a partner with the Internet Archive’s Archive-It Web archiving service.
The project grew out of a collecting challenge faced by the Benson Collection. Historically, the library had systematically collected Latin American official government documents, including annual State of the Union reports, or Mensajes Presidenciales, as well as annual reports that individual government ministries are required by law to produce. Traditionally, such reports were published and collected in print format. But beginning in the late 1990s, increasing numbers of Latin American government entities ceased paper publication of these official documents and reports, opting instead to publish them in digital format directly to the Web.
Initially, the Benson library hoped to be able to continue fulfilling its collecting responsibilities by directly linking to these born digital versions of the reports on the Web and integrating those links in the existing library catalog records for these serial publications. However, this approach soon ran into a serious obstacle: Although creating the initial set of links was fairly straightforward, over time the number of reports remaining online at the original address declined significantly as part of the process known as “link rot.” Typically, when a new annual report or State of the Union is produced and uploaded, the publishing entity deletes the previous year’s version from the Website. Critical gaps in the coverage of these reports historically provided by the Benson, in some cases stretching back to the 19th century, began to appear.
LAGDA was launched with a view toward plugging these gaps by providing for systematic capture of Latin American government Websites. Benson library staff identified close to three hundred major sites from eighteen countries in Latin America and the Caribbean, primarily government ministries and presidential sites, where such documents were published directly to the Web. Four times per year, LAGDA uses the Archive-It service to crawl the entire contents of these Websites. The resulting collection totals million URLs, or discrete documents/files, amounting to more than seven terabytes of data. Users can consult the archived sites in two different ways, both available through the LAGDA Website. First, they can conduct a full-text search across all contents of the archive, where the search result list consists of direct links to the archived content. Or second, they can browse the archived content starting at a list of links, ordered by country, of the nearly three hundred ministries and presidencies targeted by librarians at the Benson Collection.
Digital librarians at LLILAS and the Benson play an essential role in the LAGDA partnership, both in coordinating overall project activities and in taking lead responsibility for managing the Archive-It application. This includes precrawl tasks, using the application to manage the list of target URLs and configure the Web crawl settings, as well as postcrawl tasks, such as reviewing crawl reports and applying a quality-control protocol to the archived sites. Librarians also use the Archive-It application to manage metadata associated with each archived Website. In sum, all of the tasks that are described here are part of the Web Archiving Life Cycle Model.
A systematic review conducted by LLILAS Benson staff has confirmed that LAGDA contains thousands of official documents and speeches from Latin American governments that have long since disappeared from the live Web, including not only text documents but also audio and video files. For example, in June of 2009, the Honduran army deposed elected President Manuel Zelaya. During the coup d’état, the entirety of the Zelaya administration’s Web presence was deleted by the new regime. The LAGDA collection, as demonstrated in this archived copy of Zelaya’s Website, contains much of the public documentation that the Zelaya government had previously held, including dozens of speeches, government plans and reports, and details about the administration’s achievements.
Another advantage LAGDA provides for researchers is that all documents and other types of Web content are preserved in their full original context, that is, the entire Website where such documents were originally housed. This is vital, because in surveys and other venues, scholars have repeatedly stressed the importance of preserving the entire intellectual context, including the “look and feel” of born digital materials. In addition to the annual reports and State of the Union addresses are large numbers of speeches delivered by Latin American presidents and their cabinet ministers, sectorial reports, economic indicators, survey results, and other data gathered by government entities.
Discussion of Related Research Tools
In addition to the LAGDA collection, LLILAS Benson curates several other Web archives. The Benson-based Human Rights Documentation Initiative, or HRDI, runs quarterly crawls of the Web sites of numerous human rights groups around the world. There are also several smaller collections of sites, for example, one on political discourse in Venezuela, one on political parties in Latin America, and one on the Mexican celebrations in 2010 of the country’s 200th anniversary of its independence and 100th anniversary of the Mexican Revolution.
Web archiving is a good example of a mechanism for mainstreaming special collections within research libraries. In this era of ever-growing budget constraints in libraries and in higher education, in order for special collections to thrive they must have a strong presence on campus in terms of cutting-edge digital initiatives. And these initiatives must be structured in a way that integrates them into the mainstream of library activity. In the case of the Web archiving projects at UT Austin, which originated out of special collections, processes have been integrated, to the extent possible, into existing library workflows, including in the areas of acquisition, cataloging, metadata, and long-term preservation.12
Archive of the Indigenous Languages of Latin America, or AILLA
The Archive of the Indigenous Languages of Latin America is a digital repository for indigenous language materials run by LLILAS Benson and hosted by the University of Texas Libraries. AILLA’s primary mission is the preservation of irreplaceable linguistic and cultural resources in and about the indigenous languages of Latin America, most of which are endangered. Most of the materials in the archive are primary field data that were collected and deposited (donated) by linguists and anthropologists for whom audio and video recordings are a central part of their research methodology. Many indigenous organizations have also donated the results of their investigations to AILLA. The majority of AILLA’s collection consists of audio and video recordings of discourse in a wide range of genres, including conversations, various types of narratives, songs, political oration, traditional myths, curing ceremonies, and so on. Many recordings are accompanied by transcriptions and translations of the speech event. Other textual resources include dictionaries, grammars, ethnographic sketches, field notes, articles, handouts and PowerPoint presentations. The collection also contains hundreds of photographs.
AILLA’s secondary mission is to make these valuable resources widely accessible via the Internet, while simultaneously protecting personally, culturally, and politically sensitive materials from inappropriate use and supporting the intellectual property rights of the creators. AILLA’s system of access levels allows creators and depositors to have finely grained control over their materials, enabling them to restrict their entire collections or only certain files within the collections. For example, recordings might be public while transcriptions might be restricted, or vice versa. Sensitive materials are protected; however, AILLA’s directors, managers, and depositors believe strongly that access is equally important. Historically, very little of the fruit of linguistic and anthropological research has actually been available to the indigenous communities in which the research was done; AILLA aims to rectify that imbalance. Restrictions tend to keep speakers out, whereas researchers can generally gain access to archival materials through the academic network. Resources that are publically accessible can be heard and read by all speakers. Our policy is that if a resource can be made public, it should be made public, but that if it is sensitive, it should be protected. Our goal is to ensure that the unique and wonderful resources preserved at AILLA can be used to maintain, revitalize, and enrich the communities from which they arise.
AILLA was intended from the outset to function as a partner with its depositors, providing them with a means of both preserving and sharing, under appropriate terms, the fruits of their work with the indigenous peoples of Latin America. The archive accepts any legitimate resources that can be housed in a digital format.
LLILAS Benson Digital Collections include many other resources that are used by historians in their research, including: MUPI-Radio Venceremos. The Museo de la Palabra y la Imagen (MUPI) in San Salvador holds some of the most important archival collections for the study of Salvadoran history. One such collection is the Radio Venceremos archive, comprised of more than 1,200 cassette tapes documenting El Salvador’s brutal civil war and human rights violations from the perspective of campesinos and the Farabundo Martí National Liberation Front (FMLN) rebels. Radio Venceremos, the clandestine radio station that traveled alongside the FMLN forces during the war, played a crucial role in the struggle by broadcasting the on-the-ground situation to the country and to an international audience. The resulting cassette archive from the twice-daily broadcasts provides rich insight into the FMLN, the effects of war, and the power of radio. Through a partnership between the Human Rights Documentation Initiative (HRDI) at LLILAS Benson and the MUPI, the Radio Venceremos collection is being digitized and made globally accessible online. Portions of the Radio Venceremos archive can be heard in the “Tejiendo la Memoria” collection online.
This collection contains the full text of English language translations of speeches, interviews, and press conferences by Cuban leader Fidel Castro. The collection originated with the records of the Foreign Broadcast Information Service (FBIS), a U.S. government agency responsible for monitoring broadcast and print media in countries throughout the world. The records are in the public domain. Users can search or browse the entire collection, which consists of nore than two thousand texts covering the thirty-seven-year period from 1959 to 1996.
The digital Archive of José María Luis Mora contains scanned copies of more than six hundred documents, both manuscripts and printed works, from the personal papers of this prominent Mexican political figure and historian. This digital archive, which primarily covers the first half of the 19th century, includes an exhaustive guide and finding aids describing the collections. In the digital archive, the guide and finding aids are linked directly to the digital images of the described documents. The archive records can be browsed by following the archival organization and structure, while the catalog can be browsed by section or searched for the full text. The archive contains both print and manuscript documents, including correspondence, literary production, legal documents, and lists. The original archive is part of the Rare Books and Manuscripts division of the Benson Collection.
This digital archive consists of an inventory with descriptions of 364 manuscripts from the papers of Mexican statesman and historian Don Lucas Alamán. Scanned, high-resolution images of all 1,886 pages of the manuscripts from the digital archive can be browsed at this Website. Manuscripts in the archive span the period 1589 to 1853. Both the inventory and the manuscripts are in their Spanish original. The collection includes biographical notes on Alamán, as well as an introduction to the inventory. The original papers from the collection are housed at the Rare Books and Manuscripts division of the Benson Collection.
This is a collection of Mexican and Argentine presidential speeches from the 19th and 20th centuries. The speeches were digitized and put online at LLILAS Benson as an initiative of the Latin Americanist Research Resources Project (LARRP), which seeks to broaden the array of Latin Americanist resources available to students and scholars. The LARRP Presidential Messages collection contains more than seventy-five thousand pages of digitized full text of “state of the union” speeches from Mexican presidents between 1821 and 1970, and from Argentine presidents between 1810 and 1999.
The Voces project seeks to document and create a better awareness of the contributions of Latinos and Latinas of the World War II, Korean War, and Vietnam War generations. The project Website includes digital photos and digital audio from, and related to, the oral histories.
1. For the text in this section about the Primeros Libros project, the author is indebted to his colleague Anton DuPlessis, Curator of the Colonial Mexican Collection at Texas A&M University.
2. File specifications include 400 pixels per inch (ppi) resolution, minimum; 24-bit color; lossless compression; single page capture; presence of the Kodak Color Separation Guide with ruler (or equivalent) and, if not legible, a high contrast, 2 inch/5 cm photo documentation ruler; imaging should be done on a nonreflective black or dark background. Additional details are specified in the project’s Partnership Agreement.
4. "Collaborative Digital Collection Building: The Guatemalan National Police Historical Archive," Kent Norsworthy, in Portal: LLILAS Annual Review, Issue Number 7, 2011-2012, p. 19.
5. For detailed information and analysis on the Fernando Garcia case, including links to some of the AHPN documents used in prosecuting those responsible for the crime, see “Guatemalan Court Convicts National Police Chief,” National Security Archive.
6. For more info on the course, see “Toward ‘Conciliation’ in Guatemala: Two Guatemalan Perspectives.”
7. “Collaborative Digital Collection Building: The Guatemalan National Police Historical Archive,” Kent Norsworthy, in Portal: LLILAS Annual Review, Issue Number 7, 2011–2012, p. 20.
8. See this good introduction to the history and practice of Web archiving, including an exhaustive bibliography, by Brenda Reyes Ayala.
9. See “About IIPC.”
10. Up-to-date statistics can be found here.
12. For more on this topic, see Web Archiving and Mainstreaming Special Collections.