For Rebecca Moore Howard’s CCR 635: Textual Research in Composition and Rhetoric
In creating a literature review for doing digital humanities work, I had to eventually take pause—DH is active, and in looking for it, I seem to find it everywhere. Even employing methods of reading distantly, that is using digital tools to allow me to read a large number of texts I have collected for the keywords that are illustrative of their focus, I still couldn’t cover enough of the smart work that is being done, the critical questions being raised, the collaboration and conversation taking place. This synthesis of works is by no means comprehensive or perhaps even representative of work being done under the umbrella of DH. I have attempted to immerse myself in the flow of conversations and materials circulating on the web—bookmarking every crumb that makes mention of tools, programs, articles, arguments, definitions, and projects. In a meeting, you (RMH) asked “is it more accurate to talk about tools than methods?”, to which I couldn’t give an answer. I’m not sure I can now either, but I will work to provide coverage of responses that both answer and complicate this question in doing research that employs DH methods. I will discuss some approaches to framing DH work; how texts become sets of data; creating visualizations of data; interactions of scale in creating and reading visualizations; ethical, social, political, economic and cultural considerations; and curation of DH research.
Framing DH: Definitions, Justifications, Orientations
I thought a reasonable place to begin my research into DH methods would be to define what they are. However, I quickly learned that DH is difficult to define as a uniformly accepted term. From my reading, I attribute this to variations in definition of what it means to do digital humanities work (as influenced from what disciplinary affinity – computer sciences, artificial intelligence, social science, etc.); how tightly one aligns themself with the “digital” or the “humanities” (which are both rich terms as themselves); or how one envisions the joining of the two terms—what work is possible and what work should be of concern. Often DH is referred to as a large, overarching structure: a tent, an umbrella, something under which different alignments, investments, and affinities can find space. DH isn’t singular in its material base, projects, methods, or scholarly sphere. What seems to surface with frequency is an conceptualization of research that works to answer the call of the humanities to increase the use of evidence in work, conducting RAD research that is replicable, aggregable, and data supported. This suggests a move that engages the materials of humanities—texts— in ways that can yield more than anecdotal accounts of experience or interpretations of a small scope—a close reading of a text or a small handful of texts. The form(s) this research takes—how it is designed, from what data it works to find pattern within, how its findings are presented (which methods of data analysis and visualization) are what makes DH so varied—the textbase (that allows for the creation of textual data), the emergent questions and curiosities, and how to illustrate what is of interest vary from inquiry to inquiry. The forms and the content develop in relation to one another to best see something interesting. In Graphesis, a work by Johanna Drucker that works to articulate the creation of visualizations of data in humanities research, Drucker explains that how we know what we know about any given concept, is based on our models of knowing—our models, our visuals, “mediate our experience by providing conceptual schema or processing experience into form” (15). I think this is a provocative and durable statement to hold on to in thinking about DH methods because it both captures the essence of intrigue in the work—the desire to look at something differently to look for things we have not yet seen—as well as the relationship between how we represent data as visual constructions of patterns that exist within the materials we care for and research from.
A line of tension relative to these visuals—our forms of what we know and how we know—exist in DH as issues of disciplinary alignment, often vis a vis tensions in thinking of research method or methodology. This is perhaps too tidy an explanation for something I do not have the history nor breadth of examples to represent more robustly, but is something in need of articulating nonetheless. DH can be oversimplified at times as “tools”, or a fixation on technologies for collecting, reading, and visualizing as neutral or ahumanistic and more akin to work of the (computer) sciences . While these tools are essential to DH work, there are other accounts of research going on that care more for who designs the tools, has access to the tools, and what they afford and constrain—not just in findings, but in context of people, institutions, and social, economic, cultural, political, and philosophical factors. Johanna Drucker, in “Humanistic Theory and Digital Scholarship”, questions of humanities scholars what impact the humanities have had on the digital environment, and the possibility of digital platforms and interfaces that are created from humanistic methods instead of the borrowing of methods from outside of the discipline, which she describes as at odds with the cares and concerns of humanities work. She explains that humanities work has encountered digital tools, but what of humanities tools in digital contexts? I see this, again maybe too simply, as deep concern with methodology—how and what researchers are doing for what reasons, for whom or what. A humanistic approach, she explains,
“means that the premises are rooted in the recognition of the interpretative nature of knowledge, that the display itself is conceived to embody qualitative expressions, and that the information is understood as graphically constituted”.
While I will continue to refrain from defining DH too tightly or narrowly, or too constrained in one epistemology over another, I can say from my readings that DH is not just tools, or rather, DH is not about the uncritical creation and use of tools that don’t care for matters of concern in the tools interaction with contextual and textual networks of relations (person, place, time, and so on).
Text as Data: Mining, Encoding, and Method
For many, data has numerical connotations—in sciences, data begins as quantified observations, which visualization methods are selected to highlight; so what does data mean when as a discipline, texts are what are created as observations? Some scholars wish to distance data in the humanities from data in the sciences because it functions differently. Johanna Drucker compares data versus capta explaining that capta is “taken” actively while data is assumed to be a “given” that is able to be recorded and observed. The difference Drucker sees arising is that humanistic inquiry acknowledges that its knowledge is “situated, partial, and constitutive”—this is the recognition of knowledge as a construction, “not simply given as a natural representation of pre-existing fact” (“Humanities Approaches to Graphical Display”). While Drucker calls for a rethinking of data as capta that better expresses its ambiguity over certainty—which gets at what she describes as interpretative complexity—DH data needs to acknowledge the lens that it is constructing to look at its texts. I will provide a gloss of how data is garnered from humanities texts along with some methods for collecting it: data mining, textual analysis, and differentiated reading. To begin, data in the humanities does not have the same aims as data in the sciences. In the sciences, data is used to attempt to arrive at a truth, but in the humanities data is used to arrive at the questions we desire to ask (Stephen Ramsay, Reading Machines 68). The digital humanities exist to collect and create data, to notice something of interest, to formulate questions, to work to create visualizations that will allow us to better see patterns of interest, and to postulate all over again: data transforms theory, and theory transforms data into interpretation for further theorizing (Cathy Davidson, “Humanities 2.0: promise, perils, Predicitons”). Data is the locus of digital humanities work because it functions as a representation of information, that can be interpreted and reinterpreted, in a manner (or form) that functions as something to communicate, something to interpret, or something to process – it is what research works to articulate, make visible, and let articulate (Alex Poole, “Now is the Future Now?”). Texts are data, or are the potential to become sets of data because they are addressable as a thing of many parts; one can query a position within the text at a certain level of abstraction, like that of a character, word, phrase, line, etc. (Michael Witmore, “Text: A Massively Addressable Object”). Data are created from text based on what is of interest to look for, and are data as singular texts and as large corpora of texts, the size of the base relating to what is of interest t the researcher. Texts are used to create textbases from which various methods and tools might be implemented to draw out patterns and examine trends, which serve as catalyst for further questioning and research (Lang and Baehr, “Data Mining”). Text bases are “coherent collection of semi or unstructured digital documents” that “can come from any written discourse” which “all cohere in some manner”, existing as a corpora of documents assembled around a specific unifying principle that is either thematically or generically similar (Cooney et al.”The Notion of the Textbase”). A data set might come from a collection of romantic poetry in which the researcher wishes to see what adjectives are used to describe love—those adjectives from that set of texts would be the data. This data is created through a process called data or text mining. Text (or data) mining is, to put simply, is knowledge discovery; creating data to apply DH tools to is not just a step in a process toward results, but a practice of curiosity and inquiry in what might result—this process can be exploratory, descriptive, and even predictive in its function (Lang and Baehr, “Data Mining”, 176-179).
Data mining works in this way: first, the features of the text that are of interest to examine are determined, this is often done as texts are collected because something of interest surfaces to help determine how the texts will be read; the texts are then rendered into plain text format so that they can be read as their text only (this means they are stripped of style and layout in their publications); a number of tools can be used to perform mining on texts, but what is determined is what parts will be looked for as differentiated from other parts of the text—this means that a script must be created that informs the tool (computer program) how and what to read. Directives for reading the texts (for their particles, their pronouns, their adjectives, etc.) are created along with lists of words to not get hung up on, more commonly referred to as stop words. An algorithm is run on the texts to bring to the surface words that fit the criteria set and to disclude words that do not. What results will vary widely based on topic of inquiry, tool being used, and how the inquiry was framed. Distant reading and text analysis are not, to my understanding, different form data or text mining conceptually. All of these terms work to describe a shift in the scale at which we read texts—a move from closely reading one text from beginning to end as a whole, to ways of reading parts of a text or collection of texts that can shift scale on interaction based on what is driving inquiry. If I had to define each of these concepts to better frame this concept of texts as data, I would explain distant reading as a concept that drives DH work—a way of looking at texts differently, with distance or changes in scale, that seeks patterns that can become visible with this differentiated reading. Data or text mining is more procedural in that it is what is done to texts so that they can become data sets for inquiry. Text analysis is like text mining in that it is looking at parts of a text, however, instead of singular terms, text analysis is interested in concordance or the association or proximity of terms (Geoffrey Rockwell, “What is Text Analysis Really?”).
Visualizing Data: Making Visible the Form of Content
Visualizations are as diverse and varied as the imagination can conjure—pie charts, line graphs, scatter plots, bar graphs, word clouds, bubble diagrams, maps, and so on. Visualizations are what are created from the data to represent findings. While I can’t account for all of the types of visualizations (but will include links to some resources lists at the end of this review), I would like to discuss what thinking goes into creating a visual of textual data. Visuals can work by: offering a visual analogy, providing a visual image of non visible phenomena, and providing visual conventions to structure operations (Johanna Drucker Graphesis, 5). Because visualizations are articulations of patterns found in research, their design, or their form matters a great deal. A visualization brings attention to patterns, and thus needs accommodate a mix of evidence and argumentation. Visualizations are intended to show patterns of interest and should provide a way to find possible patterns to investigate and a means to locate those patterns against a baseline provided by a set of other relevant works (Radizkowsa et al. “Information Visualization for Humanities Scholars”).
In deciding what type of visualization should be sued, experimentation with forms is encouraged to develop a perspective in which to situate the interpretations (Radzikowsa et al). Tanya Clement, quoting computer scientist Ben Schneiderman, describes visualizations as providing “a window into research results but have an inherent limitation of space that results in the ‘occlusion of data, disorientation, and misinterpretation’” (“Text Analysis, Data Mining, and Visualizations in Literary Scholarship”). Visualizations are digital humanities’ lens to see its text differently and at different scales.
Scales of Seeing
Scale is a matter of concern in creating visualizations of data because it determines what is seen. Franco Moretti, in his oft cited work “Graphs, Maps, Trees: Abstract Models of Literary History” (which based on frequency of citation might be a seminal work in digital humanities), coined the term distant reading, which is a difference in scale in how texts are “Read” or encountered from a distance in a large corpora with the assistance of computational tools. He describes the significance of scale:
“What do literary maps do … First, they are a good way to prepare a text for analysis. You choose a unit–walks, lawsuits, luxury goods, whatever–find its occurrences, place them in space … or in other words: you reduce the text to a few elements, and abstract them from the narrative flow, and construct a new, artificial object like the maps that I have been discussing. And with a little luck, these maps will be more than the sum of their parts: they will possess ‘emerging’ qualities, which were not visible at the lower level” (53).
Differential reading, or reading at scales, defamiliarize texts, making them unrecognizable in a way (putting them at a distance or oppositely at a proximity) that helps identify features otherwise unseen, to make hypotheses, generate questions, and figure out patterns and how to read them (Clement, “Text Analysis, Data Mining, and Visualizations in Literary Scholarship”). Scales of interaction share a common objective: detail in our data – “The objective is much the same: to restore to our field of view precisely that which is right beneath our nose but too ubiquitous to be synthesized in the human mind” (Flanders and Jockers, “A Matter of Scale”). Reading close and distant isn’t necessarily how we think of close and distant as binaries in proximity to an object. Scale more operates along a continuum that shifts as attention to particular parts of texts or across text shifts; which allows us to “pay attention to databases, data flow, data architectures and the human element behind them” (Clement). Scale can take place through the construction of the data set, or textbase, the number of materials in the corpora, or in the tools being used to read the data. This notion of scale as both close and distant, micro and macro, surfaced in how scholars discussed tools as supportive of that kind of shuttling between different levels of scale: seeing patterns, seeing outliers, zooming in and zooming out (Flanders and Jockers, “A Matter of Scale”). “The computer revolutionizes, not because it proposes an alternative to the basic hermeneutical procedure, but because it reimagines that procedure at new scales, with new speeds, and among new sets of conditions” (Stephen Ramsay, Reading Machines, 31)
“We need to acknowledge how much the massive computational abilities that have transformed the sciences have also changed our field in ways writ large and small and hold possibilities for far greater transformation in—research, writing, and teaching—that matter most” (Cathy Davidson “Humanities 2.0: Promise, Peril, Predictions”).
Criticisms and cautions in DH work are complex negotiations of context, resources, and cultural value. What emerged from many texts was the notion that DH work is done to advance the goals of humanist scholarship; what differs, though, are the affordances and constraints of working in the digital. Jamie Skye Bianco asks “does DH need an ethical turn?” to which she responds yes because it operates through webs of people, institutions and politics in uneven networks of relation. People and institutions are a part of DH work: they have/n’t access to texts to research, are/n’t represented in texts, have/n’t access to tools for research, and have/n’t access or representation in what is created. Texts are contextual, they are heterogeneous and dynamic; but reading them for their semantic parts and rendering them as visualizations of selected parts that are oft negligent of situating in the whole being can run the risk of de-emphasizing the human element of the humanities. This risk may come from separating the methods of doing DH work (the tools) from the theories that give impetus to the work. This separation of theory and method risks flattening context by not revealing difference; “the constellation of context, affect, and embodiment must remain viably dynamic and collaborative in digital and computational work” (Bianco, “The Digital Humanities Which is Not One”). Because digital and computational work “documents, establishes, and affectively produces an iteration of real worlds” that are “multimodally layered” (Bianco), not losing context (and its embedded elements) becomes matter of concern. The challenge is to shift humanistic study from attention to effects of technology to a humanistically informed theory of making of technology – considerations of affect, the constructivist force of knowledge as observer dependent and emergent (Drucker, “Humanistic Theory and Digital Scholarship”). Digital work needs to consider the realms of the digital, and the context that are digitized and situated around digital materials, need to be envisioned as “shared knowledge, culture, and semantic content” (Bianco).
Sustainability, Durability, and Curationability of DH
In reading about the tremendous labor that goes into this work—from digitizing and collecting texts in searchable databases with flexible metadata, to inventing and maintaining the tools, to creating and housing projects—I could not help but question who cares for DH and according to what protocols. For work that is necessarily digital, I wondered about the durability, even the lifespan, of such projects. While some work is being done to curate DH research, the uptake is, at this time, thin. Matters of concern in terms of accessibility and availability of data seems of highest priority. Work is being done, from a variety of institutions and organizations, devoted to the preservation and representation of DH research to promote research on texts as cultural artifacts (Cooney et al.”The Notion of the Textbase: Design and Use of Textbases in the Humanities”). The goal is to move beyond issues in just aggregating data toward managing DH content as knowledge (Graban et al.), which requires larger dialogues about access, proprietary rights, the boundaries of technologies, and conflicts between personal and communal interest (Graban et al. “In, Through, and About the Archive: What Digitization (Dis)Allows”). Key issues that affect curation include the size of the data set (digital files of large corpora are tremendous in size), the number of objects to be curated and their complexity, the interventions needed to care for the data, ethical and legal concerns, policies, practices, standards, and economic incentives (Poole, “Now is the Future Now? The Urgency of Digital Curation in the Digital Humanities”). Aside from standards that would need to be set in care of each of these issues, much would have to be created in terms of infrastructure to take on such content—it would need to be flexible, scalable, and economically and technologically sustainable. Interfaces for both human and machine curators would have to be created as standardized for managing this content. Additionally, in order to create such a system and an interface to that system, metadata standards would have to be created and agreed upon for content to be identified/identifiable and retrievable so that it is useful (Poole). While digital content, and some tools and services exist, they are, currently, not necessarily useful or usable (Poole).
Treading Water in DH Flows
This literature review, as to be expected in being a beginner orienting myself in new texts and ideas, barely scratches the surface. Because I’m talking about DH more generally, instead of focusing on any particular tool or method, this work is more like a survey of establishing traces of work to re-immerse myself in, orienting as interest or use dictates. Until then, I take pause on thinking through this handful of sources to establish connections, figures, concepts, ways of doing, and navigating DH not as a singular discipline, but an assemblage of many.
This is in no way comprehensive (but on the web it can be amended and tended to, maybe in a separate location on my blog). Here is a (small) handful of lists and links to explore DH tools, concepts, and projects.
Glossary of terms from MLA Commons by Daniel Powell, Constance Crompton and Ray Siemens
HASTAC (Humanities, Arts, Sciences, and Technology Alliance and Collaboratory) Digital Humanities Resource Guide
Digital Humanities Research Tools and Resources Guide by University of Illinois Urbana-Champaign
Digital Humanities tool list built by Alan Liu
Tutorials for DH Tools and Methods list built by Alan Liu
Journal of Digital Humanities -“comprehensive, peer-reviewed, open access journal that features the best scholarship, tools, and conversations produced by the digital humanities community”
In conversations in our Rhetoric, Composition and Digital Humanities seminar with Collin Gifford Brooke, CGB has described an interest in creating bibliographies that assign and visualize weight in the use of texts in a written work. I rather liked this idea and attempted to make visible how much I used each text in my bibliography. Texts used the least are in 12 point font (one citation), while texts used the most are in 24 point font (five citations) —with a range in between to capture the distributed attention the text received.
Bianco, Jamie “Skye”. “This Digital Humanities Which Is Not One.” Debates in the Digital Humanities. By Matthew K.. Gold. Minneapolis: Univ Of Minnesota, 2012. Print.
Clement, Tanya. “Text Analysis, Data Mining, and Visualizations in Literary Scholarship.” Literary Studies in the Digital Age. MLA Commons. Web.
Cooney, Charles, Mark Olsen, and Glenn Roe. “The Notion of the Texbase: Design and Use of Textbases in the Humanities.” Literary Studies in the Digital Age. MLA Commons. Web.
Davidson, Cathy N. “Humanities 2.0: Promise, Perils, Predictions.” Debates in the Digital Humanities. By Matthew K.. Gold. Minneapolis: Univ Of Minnesota, 2012. Print.
Drucker, Johanna. “Graphesis: Visual Knowledge Production and Representation”. Poetess Archive Journal 2.1 (2010): 1-50. Web.
Drucker, Johanna. “Humanities Approaches to Graphical Display.” Digital Humanities Quarterly 5.1 (2011). Web.
Drucker, Johanna. “Humanistic Theory and Digital Scholarship.” Debates in the Digital Humanities. Minneapolis: Univ Of Minnesota, 2012. Print.
Graban Tarez Samra, Alexis Ramsey-Tobienne and Whitney Myers. “In, Through, and About the Archive: What Digitization (Dis)Allows”. Rhetoric and the Digital Humanities (forthcoming).
Jockers, Matthew L. and Julia Flanders. “A Matter of Scale”. UNL Digital Commons. Web.
Lang, Susan and Craig Baehr. “Data Mining: A Hybrid Methodology for Complex and Dynamic Research”. College Composition and Communication 64:1 (2012): 172-194.
Moretti, Franco. “Graphs: Maps, Graphs, Trees: Abstract Models for Literary History”. New Left Review: 28 (2003): 67-93.
Poole, Alex H. “Now is the Future Now? The Urgency of Digital Curation in the Digital Humanities.” Digital Humanities Quarterly 7.2 (2013). 30 Jan. 2014. Web.
Radzikowsa, Milena, Stan Ruecker, and Stéfan Sinclair. “Information Visualization for Humanities Scholars.” Literary Studies in the Digital Age. MLA Commons. Web.
Ramsay, Stephen. Reading Machines: Toward an Algorithmic Criticism. Urbana: University of Illinois Press, 2011. Print.
Rockwell, Geoffrey. “What is Text Analysis, Really?” LLC 18.2 (2003): 209-219.
Witmore, Michael. “Text: A Massively Addressable Object.” Debates in the Digital Humanities. Minneapolis: Univ Of Minnesota, 2012. Print.