Archive and Library

Summary and Keywords

Archives and libraries operate within a complex web of social, political, and economic forces. The explosion of digital technologies, globalization, economic instability, consolidation within the publishing industry, increasing corporate control of the scholarly record, and the shifting copyright landscape are just some of the myriad forces shaping their evolution. Libraries and archives in turn have shaped the production of knowledge, participating in transformations in scholarship, publishing, and the nature of access to current and historical materials. Librarians and archivists increasingly recognize that they exist within institutional systems of power. Questioning long-held assumptions about library and archival neutrality and objectivity, they are working to expand access to previously marginalized materials, to educate users about the social and economic forces shaping their access to information, to raise awareness about bias in information tools and systems, and to empower disenfranchised communities.

New technologies are transforming the practices of librarians and archivists as they restructure bibliographic systems for collecting, storing, and accessing information. Digitization has vastly expanded the volume of material libraries and archives make available to their communities. It has enabled the creation of tools to read or decipher material thought to have been damaged beyond repair as well as tools to annotate, manipulate, map, and mine a wide variety of textual and visual resources. Digitization has enhanced scholarship by expanding opportunities for collaboration and by altering the scale of potential research. Scholars have the ability to perform computational analyses on immense numbers of images and texts. Nevertheless, new technologies have also presaged a greater commodification of information, a worsening of the crisis in scholarly communication, the creation of platforms rife with hidden bias, fake news, plagiarism, surveillance, harassment, and security breaches. Moreover, the digital record is less stable than the printed record, complicating the development of systems for organizing and preserving information. Archivists and librarians are addressing these issues by acquiring new technical competencies, by undertaking a range of social and materialist critiques, and by promoting new information literacies to enable users to think critically about the political and social contexts of information production.

In most 21st-century archives and libraries, traditional systems for stewarding analog materials coexist with newly developing methods for acquiring and preserving a range of digital formats and genres. Libraries provide access to printed books, journals, magazines, e-books, e-journals, databases, data sets, audiobooks, streaming audio and video files, as well as various other digital formats. Archives and special collections house rare and unique books and artifacts, paper and manuscript collections as well as their digital equivalents. Archives focus on permanently valuable records, including accounts, reports, letters, and photographs that may be of continuing value to the organizations that have created them or to other potential users.

Keywords: scholarly communication, politics of information, critical information studies, library neutrality, archival theory, digital humanities, digital materiality

Defining Archive and Library

Archives and libraries occupy a privileged space in the social imaginary. In technologized and data-driven environments, they play a symbolic role as guardians of history, culture, and memory. These institutions embody, both literally and figuratively, the challenges of assembling, organizing, transmitting, and preserving vast quantities of information. The terms “archive” and “library” have become productive metaphors demonstrating a flexibility and ambiguity that lends itself to theoretical speculation and manipulation. Archives and libraries are thus at the center of contemporary debates about the production of knowledge and the role of memory institutions in authorizing historical narratives and legitimating political power. Library and archival metaphor as well as theorizations within archival and library science are integral to discussions of historical memory, the evolution of the disciplines, the creation of national identity, the consolidation of institutional power, the deficiencies of the colonial record, the underrepresentation of women and minorities, the collection and misuse of personal data, and the subjective nature of tools for organizing, searching, and displaying information.1

Libraries are defined as repositories of published materials such as books, journals, and other media, while archives are defined as repositories of unpublished and unique materials such as manuscripts, letters, and official documents. Since the late 20th century, however, writers outside library and information science (LIS) have increasingly used the terms “archive” and “library” to designate any collection of objects in analog or digital form privately or publicly held. Those writing about “the archive” may be referring to the entire extant historical record, everything currently available in print or digital formats or some subset of material on a particular subject or assembled by a particular person, group, or institution. The literary archive has been defined as “all forms of storytelling, fiction, poetry, song, drama and their offshoots,” but may also be defined to include secondary works that describe or critique these materials.2 All scholarship may be understood as an interpretation of and a contribution to the archival record.

Not surprisingly, archives have a more specific meaning for archivists. The Society of American Archivists defines archives as “[m]aterials created or received by a person, family or organization, public or private, in the conduct of their affairs and preserved because of the[ir] enduring value . . . or as evidence of the functions and responsibilities of their creators, especially those materials maintained using the principles of provenance, original order, and collective control. . . .”3 The concept of provenance has been fundamental to archival practice since the late 19th century and pertains to the maintenance of original context.4 The primary objects of concern to archivists are not records or archives in themselves but “records in their contexts, records as part of processes of attribution and communication of meanings.”5

Historical and Theoretical Background

Literary scholars have always depended upon libraries and archives for their objects of study, but the turn to history in many academic disciplines in the 1980s and 1990s entailed increased dependence upon library and archival sources. It thus brought heightened scrutiny to the role that memory institutions play in shaping understanding of the past. It prompted questions about how libraries and archives are assembled and organized. It also raised questions about what constitutes archival evidence and the ways in which that evidence is deployed.6 This expanding historical focus prompted a concern with omissions in the documentary record and silences in the archive. Many writers and artists developed a heightened interest in information that has been lost, distorted, or suppressed, while others have explored a variety of ways to highlight archival absence.7

Scholarly understanding of the historical record has derived in part from the work of Jacques Derrida and Michel Foucault. Derrida’s Archive Fever describes how the structure of the archive determines what can be archived; history and memory are shaped by the technical methods of what he calls “archivization.”8 Derrida’s work has informed scholarly recognition of the contingent nature of the archive and the understanding that it is shaped by social, political, and technological forces. If the archive cannot or does not accommodate a particular kind of information or mode of scholarship, then it is effectively excluded from the historical record. Library and archival technology determine what can be recorded and preserved and thus transmitted to future generations. Derrida’s claim that “archivization produces as much as it records the event” has reinforced the notion that archivists and librarians play a role in creating as well as stewarding the historical record.9

Foucault describes the archive as “the system of discursivity” that defines the boundaries of what can be thought and said.10 He argues that systems of power and influence determine what counts as knowledge and what questions are worth asking. Through professional organizations, conferences, journals, and specialized vocabularies, disciplines are constituted as discursive formations that define their own truth criteria and thus authorize and legitimate particular forms of knowledge. In similar fashion, libraries and archives provide legitimacy and authority to the materials they have acquired, organized, and made available and in so doing define what constitutes the historical and scholarly record. Foucault describes the evolution in the 19th century of a conception of the library as a “place of all times that is itself outside of time and inaccessible to its ravages.”11 He thus illuminates and encourages readers to question the once common assumption that library and archival collections are somehow unconstrained by historical forces and able to provide transparent and unmediated access to the past. By extension, Foucault’s work also prompts questions about the ways in which the Internet is embedded in systems of power and thus shapes and constrains access to knowledge and information. Internet search engines, in particular, confer legitimacy upon whatever their algorithmic ranking systems determine is most relevant. Foucault helps scholars to see that algorithms may function as mechanisms of power designed to pitch products, ideologies, or particular versions of events.

Theorizing Archival and Library Neutrality

For several decades, members of the library and archival communities, influenced by the work of Foucault, Derrida, and other cultural theorists, have grown increasingly skeptical about the ideals of neutrality and objectivity that once defined their professional practice. Library historian Michael Harris, writing in 1986, claimed that it is essential to expose the historical embeddedness of library work and to unmask the claim to autonomy “founded on a nonexistent neutrality.”12 Archival theorist Brien Brothman argued in 1991 that archives “have been too much regarded as culturally transparent” and that archival practice “reflects its time, and can only be grasped within the context of its historical culture.”13 Since the turn of the century, archives and libraries have increasingly been seen as “active sites where social power is negotiated, contested, confirmed.”14 Many information professionals have come to understand that power and knowledge are inextricably linked and that, as Derrida has noted, “there is no political power without control of the archive.”15

Foucault’s description of the establishment of classification systems and taxonomies within the field of natural history has influenced contemporary understanding of library and archival systems of organization and arrangement. Foucault argues that the development of taxonomies signals a belief in their objectivity and transparency.16 Following Foucault, there has been a widespread rejection of the notion that knowledge can be neatly organized into categories that reflect some underlying and inherent structure. Building in part on the work of Foucault, Geoffrey C. Bowker and Susan Leigh Star illuminate the ways in which categories and standards, established through generalization and abstraction, function to limit and constrain the users’ understanding of the world. Classification systems always privilege particular perspectives and render others invisible.17 Targeting long-held assumptions about the objectivity of library classification and controlled vocabularies, Hope A. Olson has shown that hierarchical systems that purport to codify and name the universe of knowledge are inevitably biased; highly structured and standardized systems are by definition exclusionary.18 Olson’s subversive reading of the cultural construction of classification systems and controlled vocabularies as well as Sanford Berman’s trenchant critiques of Library of Congress subject headings helped spark a re-evaluation of the ostensible impartiality and neutrality of librarianship and library systems.19 Within library and archival studies, a growing body of literature draws on this earlier work. Incorporating new critical and theoretical perspectives, librarians and archivists are addressing the contexts of information production and consumption as well as their connection to broader issues of social justice.

Silences of the Archive

This decades-long discussion of library and archival neutrality parallels the increasing recognition among scholars outside these fields that the contents of libraries and archives are not an objective representation of the past, but rather a collection of objects that have been gathered and preserved for any of a multitude of reasons. The subjective nature of selection, organization, and preservation means that libraries and archives can provide neither transparent nor comprehensive access to past events. Michael Lynch has argued that the archive is never “raw” or “primary” because it is always assembled so as to lead later investigators in a particular direction.20 Ann Stoler explains that historians should think of archives less as sources of information and more as subjects of inquiry.21 Archival theory has brought increased attention to the process of archival construction and its role in shaping historical narratives, especially those of imperial regimes. As Carolyn Steedman acknowledges, “historians read for what is not there: the silences and the absences of the documents always speak to us.”22 This recognition of the failures and omissions of the archive has been a central feature of certain disciplines. In women’s studies, for example, scholars are intent upon uncovering the work of those previously excluded from the official record. Similarly, historians investigating the institution of slavery or the effects of colonial power have been concerned to locate missing voices or, if that is not possible, to draw attention to what has not been represented and why. The development of new theories and strategies “to counter, mend, repair or simply acknowledge various forms of exclusion and loss” have emerged from an expanded interest in the silences built into the archival record.23

Discoveries of lost or suppressed material as well as reconstructions of previously indecipherable documents are playing an important role in filling gaps in the historical record. Computational imaging is enabling researchers to decipher historical sources previously thought to be damaged beyond repair. One such example is a 1639 volume titled The Great Parchment Book of the Honourable Irish Society. Burned, crumpled, and exceedingly fragile, it is now readable after 200 years having undergone a complex multistep process involving conservators at the London Metropolitan Archives (LMA) and researchers at University College London (UCL) Centre for Digital Humanities.24 Melissa Terras, a key participant in this process, describes how “leading-edge computer graphics” have become important tools for reading what has long been unreadable and thus restoring access to material previously lost to history.25

Researchers are also devising ways of broadening the archival record by highlighting historical omissions. Modeling a strategy to make archival silences speak, Michelle Caswell tells the story of thousands of anonymous mug shots of prisoners executed at Tuol Sleng prison in Cambodia.26 The mug shots represent only a small percent of the perhaps two million who perished at the hands of the Khmer Rouge, most without leaving even a photographic trace. A community of archivists, survivors, and victims’ relatives have deployed these photos in legal testimony and documentary films as a way of holding perpetrators accountable and memorializing the dead. These photos speak to the silence of the vast number of victims whose stories, unknown and unrecoverable, are forever lost to Cambodian history.

In some cases, humanists work with computational methods and other new technologies to reveal phenomena and patterns that could not otherwise be foregrounded or studied. One such project is a cartography of the Jamaican slave revolt of 1760 and 1761. Using a variety of sources, including maps and diagrams as well as information culled from diaries, letters, newspapers, and other contemporary accounts, Vincent Brown created a cartographic visualization that provides clues to the strategies and tactics of the rebellion.27 The project illustrates how Jamaican topography influenced the nature and course of the revolt. Brown’s work showcases new techniques that augment and clarify aspects of the historical record that could not otherwise be addressed. He is one of many researchers engaged in projects to expand the archival record by locating, gathering, mining, mapping, reconstituting, or simply highlighting aspects of history that have previously escaped historical notice or scrutiny.

Libraries, Archives, and Historical Narratives

Jennifer Summit provides insight into the historical role of libraries and archives in shaping understanding of both past and present. Focusing on the Renaissance and Middle Ages, Summit describes the shuttering of monastic libraries in England in the 1530s and the dispersion and loss of much of their content.28 She writes that after the Reformation, newly established libraries aimed to create a historical narrative that would bolster the monarchy and advance religious reform. These libraries constructed “versions of the Middle Ages by ordering and shaping its textual remains.”29 Summit claims that through systems of selection, arrangement, ordering, storage, and retrieval, libraries created contexts through which their contents could be understood. One example she cites is the decision whether to catalog a life of a medieval saint as a work of fiction or a work of history. Scholars increasingly recognize the need to take into account “the complex layering of motives and methods by which the sources crucial to later historical research were preserved, organized and transmitted down to us.”30

Summit claims that “[i]f libraries reveal that the Middle Ages were a creation of the Renaissance, they also make it possible to see the Renaissance as a creation of the Middle Ages.”31 Only through the recasting of earlier documents were Renaissance scholars able to define their own work as a rational corrective to the superstition and intolerance of previous centuries. These scholars conceived a notion of the medieval to highlight their own triumphant emergence as defenders of rationality, liberty, and tolerance.32 The transmission and transformation of medieval texts were necessary elements in defining and describing the Renaissance.

Much like Summit’s argument that the Middle Ages is an invention of the Renaissance, an argument may be made that the age of print is an invention of the digital age. The development of digital media and the Internet have shaped understanding of print culture and the history of the book. In an increasingly digital environment, the printed text has become visible as a specific technology developed at a particular historical moment whose centrality has waned with the proliferation of digital content. The growth of interest in the history of the book is in part the result of this new self-consciousness. Katherine Hayles claims that digital media have allowed us “to see print with new eyes.”33 Emerging information technologies have encouraged us to understand the evolution of the book as part of a larger historical process. But if the digital turn provides a new perspective on print technology, it is also true that, like conceptions of the Renaissance, notions of the digital era have been shaped in opposition to the period that preceded it. Whereas print culture is understood as linear, fixed, stable, and based on authority and hierarchy, digital culture is seen as fluid, nonlinear, hybrid, and remixable. Because digital objects are more easily created, altered, copied, stored, and disseminated, they are said to transcend the physical and financial limits of a paper archive.34 Much contemporary thinking about print culture has emerged retrospectively as a corollary of claims about the transformative power of the digital and in concert with the techno-utopian vision of Silicon Valley.

Digital Transformations of the Past

New technologies have vastly expanded access to the historical record. Massive databases of textual, visual, and numeric data allow researchers to ask new kinds of questions, establish new forms of collaboration, and engage with historical artifacts in new ways. In the words of archivist Charles Jeurgens, “digitization is fundamentally changing the relationship between the archive, the archivist and the researcher.”35 No discipline has been untouched by this phenomenon. But despite the fact that computational technologies are often seen as a radical break with earlier modes of representation, these same technologies have fostered recovery and recirculation of vast numbers of historical texts and artifacts. As Neil Rhodes and Jonathan Sawday argue, the relative novelty of the computer helps to disguise “the fact that this machine also perpetuates the old.”36 Digital archives offer extraordinary access to books, journals, newspapers, film, television, music, art, and popular culture from earlier eras. Digitization of critically neglected works has fostered a rethinking of the literary canon and raised questions about what constitutes canonicity. The Women Writers Project, for example, was one of the first to authenticate and digitize a wide array of previously inaccessible and out-of-print early modern writing, making it available for teaching as well as scholarly research.37 Many new digitization projects have sought to foreground race and gender and to reinterpret, augment, and/or revise the literary canon. Others have prompted an expansion and rethinking of the colonial archive and imperial history. This is not to suggest that digitization has magically erased the problem of the underrepresentation of women, people of color, and other marginalized groups. Still, there has been progress as well as growing recognition of the need to address historical omissions and exclusions.38

The proliferation of digital versions of both popular and elite artifacts from previous eras has created an appetite and a market for their aggregation and repackaging. Will Straw has described the many ways in which digital culture refashions and refocuses our historical perspective. “Reinvigorating the past, and slowing down processes of obsolescence, new technologies have consistently rendered the past more richly variegated and dense.”39 Much as Jennifer Summit claims that Renaissance scholars reshaped medieval texts to create a narrative of a newly rational era, Will Straw argues that by gathering together historical artifacts, producers of new media “are perpetually producing the past in various forms of coherence.”40 In addition to this recirculation of historical content, there is a profusion of adaptations, remakes, reboots, revivals, sequels, and prequels that avidly recycle themes and characters. Popular music, television programs, films, and fashion of previous generations are integrated into new forms of cultural production that transform them into elements of the contemporary zeitgeist. As Straw argues, this both muddies our sense of historical time and increases the weight and presence of the past.41

In the face of this massive electronic record, archivist Kate Theimer maintains the importance of distinguishing different uses of the term “digital archives.” As she points out, humanities scholars use the phrase “digital archives” in a variety of contexts, but perhaps most often to refer to digital surrogates of analog originals such as the Blake Archive and the Rossetti Archive.42 For archivists, however, the phrase has a much more specific meaning. For them, “digital archives” are groupings of born-digital material that come from a single source. They are housed “within archives and special collections repositories and consist of records created or received by an organization in the course of business” before eventually being sent to their designated archival repository.43 An example of this kind of digital archive could be electronic records created by a branch of the US government and transferred to the National Archives. A slightly different example may be documents of artists or writers, including those on hard drives and personal computers, donated or sold to a university archive or special collection. Theimer emphasizes the importance of understanding the differences between types of digital collections and of attending to the specific contexts in which they were created—who assembled them, for what purpose, and using what criteria.44 Looser, less precise definitions of the terms “archive” and “digital archive” have been useful in facilitating interdisciplinary discussions of the function and meaning of the historical record, enriching theoretical work both within and outside of the archival profession.45 Nevertheless, it is important to note that in professional archival parlance, the terms “archive” and “digital archive” have very specific meanings.

It is often assumed that electronic collections offer broader and more convenient archival access than print collections. But the Ian McEwan archive at the Harry Ransom Center in Texas calls this assumption into question. Among a wealth of material, that archive includes seventeen years of McEwan’s email.46 As Lise Jaillant explains, providing access to archived email is significantly more complex than providing access to letters on paper.47 Copyright, privacy concerns, and technical issues each present a set of obstacles. For these reasons, institutions that do furnish access to digital versions of email typically require researchers to make on-site visits in order to consult them.48 Jaillant notes that the McEwan archive, despite its large volume of fascinating correspondence, has only been able to supply researchers with paper printouts of a small subset of McEwan’s 80,000 emails. Thus she observes that “[i]ronically, literary archives still rely on print at a time when most records are born digital.”49

Social and Material Implications of a Digital Archive

Much as post-Reformation libraries, through their collection policies and ordering systems, shaped interpretations of the Middle Ages, more recent systems for assembling, searching, storing, and providing access to historical material produce the frames through which earlier historical periods are understood. Computers increase but also modify forms of access. They reshape objects of study, alter research possibilities, and even begin to redefine what constitutes research. The structure and organization of the archive, in whatever medium, influences the choices available to researchers. To work with a digital archive requires engagement with layers of technologies that have been determined by designers and engineers and thus reflect their judgments and theoretical assumptions. The comprehensiveness, accuracy, cost, availability, interface design, searchability, interoperability, and manipulability of databases and digital collections are determined by multiple mutually conditioning factors. These may include the intentions of document creators, the professional practices of librarians and archivists, the accessibility granted by corporate content owners, the intentions of project designers, the judgment and skills of engineers and software developers, and the constraints imposed by the physical and material properties of digital infrastructure. Digital archives are best understood as technocultural artifacts constituted by the convergence and entanglement of multiple social and material elements.50

Providing an example of the intersection of social, political, and material factors in shaping historical scholarship, Natalie Zemon Davis tells the story of how she was prevented from conducting dissertation research in French archives by the Federal Bureau of Investigation (FBI). In 1952 her passport was revoked because she had contributed to a pamphlet that was critical of the (US) House Committee on Un-American Activities.51 Davis writes that because she was unable to go to France to consult the government lists, church records, criminal proceedings, and marriage contracts that would have enabled her to reconstruct lives of workers in the context of the Reformation, she took an alternative path, undertaking research in various US rare book collections, including the Folger and the New York Public Library. She responded to political and material constraints by finding new areas of interest that could be supported by research in the United States. Had she been doing this work fifty or sixty years later, she may well have been able to consult digital versions of documents relevant to her original research project. The material constraints of the archive and the (FBI) power to deny access both helped determine Davis’s research trajectory. After her passport was returned in 1960, Davis continued to work with early printed books as well as with the archives that she could now visit.52 Her contributions to both social and cultural history can thus be seen as the product of multiple forces including those that led to the alternative research path she was obliged to pursue.

In the 1990s and early 2000s, critics were exploring the social and political implications of the digital medium. Early theorists made much of the ostensible immateriality of digital objects and only later acknowledged the myriad ways in which they are as dependent upon material instantiation as printed books.53 Electronic texts and data are typically accessed with machines made of metal, plastic, and polymers. Networks composed of fiber-optic cables, copper wires, switches, and routers make this possible. Any content, whether print or digital, is subject to the physical limitations of the technology used to produce and distribute it. The storage of digital information requires space, machines, and energy. The cloud optimizes physical infrastructure, but does not replace it. Instead, it relocates it to remote energy–devouring data farms.54 Digital objects are dependent on complex configurations of hardware and software that not only have significant environmental impacts but also influence how data may be searched, saved, and manipulated. In addition, the electronic record masks the significant labor required to produce the content and the infrastructure that support digital access.

Digital Ephemerality and Digital Preservation

Investigation of the materiality of the digital record has focused in part on its instability and impermanence. To a much greater degree than print, electronic data is vulnerable to alteration, erasure, and loss. Finding ways to preserve and guarantee future access to content created on continually evolving hardware and software remains an ongoing challenge. Therefore, born-digital records are at particular risk. As reported in The Washington Post, for example, records of the US wars in Iraq and Afghanistan are less complete and less accessible than records of World War I. Original analog collections allowed for the digitization of World War I battlefield maps, orders from commanding officers, and soldiers’ letters and diaries.55 But records created on digital devices on and off the battlefield in more recent wars have been more difficult to organize and are more easily lost or erased.

This is not to say that the digitization and preservation of analog artifacts is a simple matter. It is important to recognize the ways in which the materiality of historical artifacts enable or constrain their use, retention, and possible remediation through new technologies. Highlighting the physical processes involved in this work, Marcelle Cinq-Mars and Sophie Dazé describe a complex five-year project to digitize the service files of the WWI Canadian Expeditionary Force.56 The project began with a team that “painstakingly reviewed 640,000 files, page by page, removing pins, clips and staples of all sorts. Conservators then carefully removed the adhesive from thousands of pages, separating each one to make it easier to digitize. This step took 18 months, time enough to remove roughly 260 kilograms of metal fasteners and to treat more than 80,000 pages.”57 Another step involved modifying and adjusting high-speed scanners so as not to damage fragile hundred-year-old material. The actual scanning of these documents was a multi-year project with a five-year timeline completed for the November 2018 hundred-year commemoration of the end of World War I.58 The material characteristics of both analog and digital artifacts are powerfully determinative of the kinds of labor involved in disseminating and preserving them.

Content or data that exists only in digital formats presents a particular set of challenges, but even digital surrogates of analog documents require ongoing archival intervention. Media scholars Jennifer Gabrys and Wendy Chun claim that acknowledging its impermanence is crucial to understanding the nature of the digital record. Gabrys explains that digital preservation not only demands regular processes of recovery, transformation, and retransmission but also that electronic technologies are themselves designed to produce impermanence: “[o]bsolescence appears to be ‘built-in’ on multiple levels, from the actual decay of hardware, software, and content; to the economic requirement for continued innovation; to the way in which the pastness and the newness of electronic media is narrated.”59

Wendy Chun describes digital media as “degenerative, forgetful, eraseable”—the source of new archival challenges rather than the technical solution many had hoped for.60 She locates the obstacles to fixity and stability in the physics of the digital storage medium. She explains that dynamic regenerative processes are required to support machine memory and that if computer memory provides a sense of permanence, it is because it is constantly being refreshed. The methods for maintaining and preserving digital media are a function of their material properties. Digital media are in constant need of updates, but “updates often ‘save’ things by literally destroying–that is, writing over–the things they resuscitate.”61 Chun’s description of the “material transience” of digital media complicates scholars’ understanding of what it means to steward a digital record.62

A related vulnerability of the digital medium that affects both research and scholarship is the unreliability of citations to online sources. Vital forms of evidence and support are often provided by hyperlinked citations that are susceptible to two kinds of decay. Link rot occurs when a link or URL points to a web page that no longer exists. Reference rot or content drift occurs when a hyperlink continues to connect to a web page, but the page no longer displays the referenced material. A study conducted at the Harvard Law School found that 70 percent of URLs within the Harvard Law Review and other journals were subject to reference rot, and 50 percent of the URLs in United States Supreme Court opinions exhibited link rot.63 It is unlikely that the record for humanities citations is much more stable. This lack of fixity jeopardizes scholarship and constitutes a much greater degree of vulnerability for digital than analog sources.

Library and Archival Transformations

Libraries continue to provide printed materials and many have been offering a selection of electronic resources since at least the 1970s. Still, a massive shift from analog to digital collections has involved a radical transformation in the structure and function of library organizations. New technologies demand a rethinking of the nature of cultural artifacts as well as the methods for describing and preserving them. Digital objects are less stable and less susceptible to bibliographic control and stewardship than printed texts. They resist integration into traditional bibliographic and material structures for preserving and securing the past.

Modern libraries were designed to accommodate discrete physical objects that could be selected, purchased, cataloged, shelved, circulated, and preserved according to systems developed over many generations. In the case of printed books and journals, microforms, and even CDs, DVDs, and Blu-ray Discs, librarians were dealing with objects that could be held in one’s hand, that could exist in only one place at a time, and that, whatever their physical manifestation, had distinct boundaries. Before the 2000s, library buildings, stacks, reserve rooms, circulation desks, and reference areas were designed to house, service, and circulate discrete physical artifacts. Since that time, library collections have become accessible via computer terminal anywhere in the world. These library resources may reside on servers nowhere near the institutions that purchased and made them available.64 When libraries acquire new digital content, there is typically no object to unpack and inventory and no object to work its way through an acquisition or cataloging department. Library and archival labor have been transformed by this radical shift in the nature of many cultural artifacts.

Having redirected the bulk of their collections budgets away from the provision of print material, libraries have joined forces with other institutions through partnerships and consortia to share content as well as storage and preservation initiatives. They have also been advocating for transformation in scholarly publishing and working to promote open access. A common definition of open access is “the free, immediate, online availability of research articles coupled with the rights to use these articles fully in the digital environment.”65 Despite some differences of opinion over what open access means and how it may best be implemented, there is considerable agreement within the library and archival communities that the current system of academic publishing is increasingly dysfunctional. Ann Wolpert, former director of the MIT Libraries, declares “the inevitability of open access.”66 She describes the advent of the Internet and of digital formats as a disruption that was initially greeted with enthusiasm. But the transition “did not play out as all the stakeholders anticipated or would have liked. As publishers introduced restrictive contractual business models, raised prices (often disproportionally), experimented with digital rights management, and advocated for federal legislation favorable to their own business interests,” many shed their utopian dreams of digital transformation.67 The concentration of power in the hands of a few scholarly journal publishers has enriched corporate content owners at the expense of libraries and researchers.68 Over time, prices and profits have risen dramatically while library purchasing power has declined. Wolpert laments the “extent to which access to knowledge is constrained and controlled by publishers’ business models.”69 Open access has yet to provide the panacea many had hoped for. Corporate content owners have seized the opportunity to transform open access into a profitable new business model, requiring authors or their institutions to pay to make their work available free of charge. This has not translated into a reduction in subscription fees for already existing journals although there are increasing numbers of new open access titles that are available free of charge. Libraries and other stakeholders continue to seek more progressive ways to achieve open access. They have also developed initiatives like the HathiTrust, the Digital Public Library of America (DPLA), and Europeana. These are collaborative projects meant to increase the volume of material freely available to all and to address the challenges of digital fragility and the consolidation and shrinkage of print collections.

Digital environments require that cultural heritage institutions, including archives, become more proactive. Archival educator and theorist Jeannette Bastian explains the basics of what she and her coauthors refer to as “digital stewardship of archival records.”70 They describe key components of digital cultural heritage curricula and declare that in digital environments archivists and curators can no longer wait for important records to come to them. Instead, they must take the initiative to negotiate with content creators to urge the adoption of open well-supported standard formats.71 They must take strong measures to ensure digital preservation such as capturing sufficient metadata, including technical metadata documenting the characteristics and requirements that enable management and preservation. They must provide backups and redundancies. It is likewise imperative that they create new copies before software applications and hardware become obsolete. They must address complex intellectual property rights issues. They must ensure data security and monitor for viruses, corrupt files, system hacks, and unauthorized data modification.72 Complicating these processes is the evolution of social media and other dynamic and interactive forms that archivist Joanne Evans describes as “born networked, rather than just born digital.”73 These formats demand new conceptual frameworks, new competencies, and new archival infrastructure. As the digital environment evolves so too do the challenges facing library and archival institutions.

Mitigating Archival Loss

Since the introduction of computers in the 1960s, tremendous amounts of information have been lost to superseded platforms just as large volumes of information have disappeared and continue to disappear from the Web. The Internet Archive, a nonprofit established in 1996, is playing an important role in countering this ephemerality by preserving and providing access to websites, games, music, videos, and books in the public domain. In some cases, the Internet Archive, through its Wayback Machine, maintains the only copies of web pages produced since the 1990s. Reflecting new concerns about a potential threat to the historical record, Brewster Kahle, founder of the Internet Archive, wrote that after the 2016 US election, the organization made the decision to create a duplicate copy of its digital collections to reside in Canada.74 This move to preemptively secure documentation from threats of a US presidential administration reflects a heightened awareness of the potential for archival manipulation and erasure. Echoes of a similar concern can be heard in a joint statement by the Council on Library and Information Resources (CLIR) and the Digital Library Federation (DLF). Issued in February 2017, this document declares that these organizations stand with their communities “in determined opposition to any political policies, actions, and divisive ideologies–like those we have observed during the current transition of power in Washington, DC–that contravene our shared, core values . . . and threaten our mission to create just, equitable, and sustained global cultures of accessible information.”75

But even without government interference and with several organizations focused on protecting digital data, numerous obstacles remain. No technical means exist to preserve a comprehensive record of the Internet. Web archiving relies on automated crawlers that are programmed to perform with particular priorities at particular intervals. These web crawlers cannot capture every version and every change to every web page. They typically capture links only a few clicks deep. The software that captures and ingests web pages is constrained by paywalls as well as “cookie-mediated metered access.”76 A further challenge is the proliferation of dynamic and interactive content. Clifford Lynch argues that currently existing models for “preserving some kind of ‘canonical’ digital artifacts are increasingly inapplicable in a world of pervasive, unique, personalized, non-repeatable performances.”77 He affirms the claims by Brewster Kahle and others that the nature of the Web has been changing and that traditional archival models that involve capturing copies of at least semi-static web pages are no longer sufficient.78 Lynch goes on to describe the complexity of documenting an “Age of Algorithms” in which content is a constantly moving target that may be impossible to recreate or preserve.79

Web archiving currently “relies upon a multi-stakeholder model. It is the prerogative of foundations such as the Internet Archive; national institutions; transnational organizations such as IIPC (International Internet Preservation Consortium); civil society . . . [and] the private sector. . . .”80 While it is important to acknowledge that these institutions cannot provide comprehensive coverage, they nevertheless constitute the best hope for maintaining and protecting major portions of the online record. Still, as noted by Valérie Schafer and her coauthors, there are likely to be even larger gaps in the long-term digital record of the Global South.81 Although the Internet is rapidly expanding in India, Africa, the Middle East, and Latin America, the authors claim that lack of infrastructure and insufficient technical skills hinder archival storage and preservation. These difficulties are compounded by the fact that stewardship of online material requires software that has mostly been created in and for Western alphabets and so is of limited use for digital preservation in Asian and Middle Eastern countries.82

Archival Manipulation and Erasure

Different kinds of obstacles to maintaining a digital record arise when content owners or autocrats suppress material they want withheld from the public. Maria Bustillos describes online censorship in various countries including North Korea, China, Turkey, and Egypt where governments engage in wholesale suppression of both content and journalists.83 She also describes how in the United States and elsewhere in the West, corporate content owners may be just as effective as autocrats in thwarting archival efforts. Peter Thiel provided the funds to put the online news site Gawker out of business and has fought to purchase and thus be in a position to suppress its digital archive. Bustillos enumerates some of the more important scoops and stories covered by Gawker that would be lost were Thiel to succeed. She also describes an initiative partnered by the Internet Archive and the Freedom of the Press Foundation to develop the means to protect materials threatened by what they call “the billionaire problem.”84 Its creators hope to find a way to preserve sites purchased by wealthy content owners with the express purpose of eliminating them from the archival record.

There are thus multiple threats to historical preservation. Nevertheless, of equal concern are the biases and distortions that plague the contemporary digital record. A growing number of scholars have described the potential for bias built into databases, search engines, and digital collections. These technologies are drawing increased scrutiny for what they conceal and render invisible. Scholars in a number of fields are confronting the silences created by the complexity and opacity of so-called black-box technologies.85 This focus emerges from the sense that computational processes are often designed to thwart understanding of how they function. Within the literary profession, Jonathan Freedman warned in 2007 that “the more we are freed to experience and construct our own world of knowledge through Google searches and Web crawling, the more dependent we become on the ways in which the searches and databases are constructed for us.”86 Martha Nell Smith describes the problem of digital archives that do not promote transparency and ease of use.87 Acknowledging the tremendous potential for digital archives to enhance scholarship, Janine Solberg nevertheless warns of the limitations of digital search protocols and how they may “serve to ‘black-box,’ or obscure, the underlying structures that enable their (and our) work.”88 Solberg argues that scholars must develop a better understanding of the ways in which new technologies have transformed the nature of historical research. She insists that “digital search and discovery” are neither transparent nor neutral.89

Discussions of the opacity of digital archives and search technologies often center on the power and invisibility of computer algorithms. Bethany Nowviskie explains how “the design of an algorithm—the composition of code—is inherently subjective and, at its best, critical. Even the most clinically perfect and formally unambiguous algorithmic processes embed their designers’ aesthetic judgments and theoretical stances toward problems, conditions, contexts, and solutions.”90 As research and scholarship, not to mention daily life, become more dependent upon invisible algorithmic systems that determine what we read, search, buy, hear, watch, play and how we connect with one another, there has been a growing sense of alarm. Corporate owners of the platforms that dominate the digital sphere design them to serve their own interests. Revelations that the personal data of millions of users has been compromised has created growing apprehension about Silicon Valley’s domination of social media, big data, and Internet search. Misinformation, fake news, harassment, trolling, foreign intervention, and surveillance are increasingly recognized as fundamental features of these platforms. Silicon Valley entrepreneurs promote free use of their products as they monetize and exploit user data for the benefit of the advertisers who are their paying customers.

Many literary scholars have voiced concerns about the biases and deficiencies of digital databases and tools in humanities fields. These issues cannot always be separated from broader questions about the social impacts of technology and the effects of monopoly control of digital platforms. A growing number of scholars are scrutinizing the effects of algorithmic-based decision-making, surveillance, and tracking. These analyses are crucial for understanding the systems that govern access to information that supports daily life as well as the research enterprise. Safiya Noble has done important work demonstrating how Internet search reflects and shapes social and cultural norms and prejudices. She has also more generally explored the ways in which particular values are prioritized and promulgated through artificial intelligence.91 In 2012 she wrote about how the top page of results for Google searches on the phrase “black girls” primarily displayed pornographic and racist websites.92 She found similar results conducting searches on “Latinas” and “Asian girls.” Since that time, Google has adjusted its algorithms to alter the rankings for these particular searches, but as Noble argues in her 2018 book Algorithms of Oppression, the problem goes much deeper. Search engine results on a wide array of topics “reflect historically uneven distributions of power in society.”93 They reproduce and reinforce racial, sexual, and ethnic stereotyping and discrimination. They provide simple answers to complex and difficult questions, short circuiting deeper engagement with multifaceted problems and issues. Since search engines have become a primary source of knowledge acquisition, Google’s international predominance means that it exercises tremendous power to authorize or delegitimate particular concepts, images, and people.

Frank Pasquale has helped illuminate the inner workings of the digital information marketplace. He explains the dangers of surveillance and the highly profitable use and abuse of private information by corporate giants.94 Virginia Eubanks, in Automating Inequality, describes the damage inflicted on poor and working-class families by the use of algorithms, data mining, and predictive analytics to determine access to employment, housing, health services, and financial services.95 As experts in various fields scrutinize search engine algorithms and automated decision-making for their cultural, social, and material impacts, various proposals have emerged to reign in Silicon Valley Goliaths. These include shrinking them through antitrust law or tax policy, creating new regulatory regimes for data collection, and limiting private ownership of digital infrastructure. There is also a growing recognition of the importance of ethical training as a component of education in computer science and engineering disciplines.96 Safiya Noble makes a compelling case for more interdisciplinary research into algorithmic decision-making and for a more diverse and critically minded workforce.97 The nature and character of the digital archive will be determined by the skills and values of its engineers and content creators as well as by the regulatory systems that govern its operations.

Archival Threats and Promises

The dramatic expansion in the size and functionality of the archive has magnified concerns about information organization, exploitation, manipulability, and loss. But it has also vastly expanded the tools and content available to researchers. And it has created an explosion of archival discourse, bringing scholars into conversation across disciplinary boundaries. Search engines are biased. Taxonomies and standards inevitably create exclusions. Library collections and classification systems reproduce existing forms of power and influence. Archival records are often incomplete and privilege those in power. Digital platforms subject users to surveillance, fake news, trolling, and manipulation. Nevertheless, more people than ever are able to access and harness vast quantities of information. As the digital fosters the expansion of the archive, archival metaphor and archival theory proliferate. These formulations are implicitly framed in terms of continuity. To theorize the archive is to acknowledge a range of forms that for centuries have served archival functions, providing a medium in which to create, store, and transmit culture and information.

The archive has always been vulnerable. Libraries have been burned, bombed, and flooded. Public and governmental records have been lost, redacted, kept secret, and used to reinforce state power and silence minorities. The artifacts and records of entire populations have been targeted as acts of ethnic cleansing. The digital archive is subject to its own unique forms of risk, threatened by the obsolescence of hardware and software and by the sheer mass of material disappearing from networks on a daily basis. The notion of the archive is useful because it carries within it both the ideal of preserving a comprehensive record and the reality that this is impossible.

Discussion of the Literature

Library and archival studies each have an extensive and distinct professional literature, neither of which can be summarized in this space. Both literatures have historically focused on practical issues and been heavily reliant on technical and managerial language. Since the late 20th century, however, library and archival studies have each developed more theoretical and politically informed perspectives. Much of this work centers on issues of power, privilege, and social justice. Another major focus in this period has been the challenges of the evolving digital environment. This has meant an expansion of the literature on information systems, electronic resource management, digital repositories, data curation, and the philosophy of information as well as analyses of the implications of new technologies for research and the historical record.98 Additionally, collaborations of archivists and librarians on digital humanities projects have fostered a newly emerging literature at the intersection of librarianship, archival studies, and digital humanities.99

Important contributions to the theoretical turn in archival studies include the work of Terry Cook, Joan Schwartz, Tom Nesmith, Brien Brothman, Eric Ketelaar, Jeannette Bastian, Verne Harris, Sue McKemmish, Anne Gilliland, and Wendy Duff. In library studies, theoretical approaches have been opened up by Michael Harris, Wayne Wiegand, John Buschman, Gloria Leckie, Hope Olson, Ronald Day, John Budd, Christine Pawley, Gary Radford, and Marie Radford. These authors and others have helped redirect the literatures of library and archival studies away from their earlier positivism and advocated for greater recognition of the sociocultural embeddedness of library and archival institutions. Building on this theoretical foundation, there has been a profusion of new work. This literature, sometimes referred to as critical library and information studies, encompasses the literature of critical information literacy.100 It attempts to promote greater understanding of global media cultures and the social, political, and economic factors shaping the contemporary information environment. It incorporates insights from feminist pedagogy, critical race studies, postcolonial studies, and feminist and queer theory. A similar impetus has led to the development of what has sometimes been called critical archival studies.101 This literature, also emerging from earlier theoretical work, aims to initiate a more liberatory archival practice and to empower disenfranchised communities. It seeks to identify and counter power inequities in the construction and dissemination of the archival record, to think beyond national boundaries, and to find ways to mitigate the effects of patriarchy and white supremacy. Since the 1990s, archival literature has increasingly addressed the ethics of archival practice, especially in relation to social justice.102 It has also focused on the politics of archival location. This includes a concern with community and participatory archives;103 diasporic and displaced archives;104 and archival repatriation linked to decolonization.105 Embracing interdisciplinary and intersectional perspectives, both archivists and librarians are grappling with the impact of neoliberalism and its subversion of notions of the public good, its reliance on market-based and technologized solutions, and its ahistorical and apolitical accounts of inequities in the distribution of power and resources.106

