Working DH Literature Review

For Rebecca Moore Howard’s CCR 635: Textual Research in Composition and Rhetoric

In creating a literature review for doing digital humanities work, I had to eventually take pause—DH is active, and in looking for it, I seem to find it everywhere. Even employing methods of reading distantly, that is using digital tools to allow me to read a large number of texts I have collected for the keywords that are illustrative of their focus, I still couldn’t cover enough of the smart work that is being done, the critical questions being raised, the collaboration and conversation taking place. This synthesis of works is by no means comprehensive or perhaps even representative of work being done under the umbrella of DH. I have attempted to immerse myself in the flow of conversations and materials circulating on the web—bookmarking every crumb that makes mention of tools, programs, articles, arguments, definitions, and projects. In a meeting, you (RMH) asked “is it more accurate to talk about tools than methods?”, to which I couldn’t give an answer. I’m not sure I can now either, but I will work to provide coverage of responses that both answer and complicate this question in doing research that employs DH methods. I will discuss some approaches to framing DH work; how texts become sets of data; creating visualizations of data; interactions of scale in creating and reading visualizations; ethical, social, political, economic and cultural considerations; and curation of DH research.

Framing DH: Definitions, Justifications, Orientations

I thought a reasonable place to begin my research into DH methods would be to define what they are. However, I quickly learned that DH is difficult to define as a uniformly accepted term. From my reading, I attribute this to variations in definition of what it means to do digital humanities work (as influenced from what disciplinary affinity – computer sciences, artificial intelligence, social science, etc.); how tightly one aligns themself with the “digital” or the “humanities” (which are both rich terms as themselves); or how one envisions the joining of the two terms—what work is possible and what work should be of concern. Often DH is referred to as a large, overarching structure: a tent, an umbrella, something under which different alignments, investments, and affinities can find space. DH isn’t singular in its material base, projects, methods, or scholarly sphere. What seems to surface with frequency is an conceptualization of research that works to answer the call of the humanities to increase the use of evidence in work, conducting RAD research that is replicable, aggregable, and data supported. This suggests a move that engages the materials of humanities—texts— in ways that can yield more than anecdotal accounts of experience or interpretations of a small scope—a close reading of a text or a small handful of texts. The form(s) this research takes—how it is designed, from what data it works to find pattern within, how its findings are presented (which methods of data analysis and visualization) are what makes DH so varied—the textbase (that allows for the creation of textual data), the emergent questions and curiosities, and how to illustrate what is of interest vary from inquiry to inquiry. The forms and the content develop in relation to one another to best see something interesting. In Graphesis, a work by Johanna Drucker that works to articulate the creation of visualizations of data in humanities research, Drucker explains that how we know what we know about any given concept, is based on our models of knowing—our models, our visuals, “mediate our experience by providing conceptual schema or processing experience into form” (15). I think this is a provocative and durable statement to hold on to in thinking about DH methods because it both captures the essence of intrigue in the work—the desire to look at something differently to look for things we have not yet seen—as well as the relationship between how we represent data as visual constructions of patterns that exist within the materials we care for and research from.

A line of tension relative to these visuals—our forms of what we know and how we know—exist in DH as issues of disciplinary alignment, often vis a vis tensions in thinking of research method or methodology. This is perhaps too tidy an explanation for something I do not have the history nor breadth of examples to represent more robustly, but is something in need of articulating nonetheless. DH can be oversimplified at times as “tools”, or a fixation on technologies for collecting, reading, and visualizing as neutral or ahumanistic and more akin to work of the (computer) sciences . While these tools are essential to DH work, there are other accounts of research going on that care more for who designs the tools, has access to the tools, and what they afford and constrain—not just in findings, but in context of people, institutions, and social, economic, cultural, political, and philosophical factors. Johanna Drucker, in “Humanistic Theory and Digital Scholarship”, questions of humanities scholars what impact the humanities have had on the digital environment, and the possibility of digital platforms and interfaces that are created from humanistic methods instead of the borrowing of methods from outside of the discipline, which she describes as at odds with the cares and concerns of humanities work. She explains that humanities work has encountered digital tools, but what of humanities tools in digital contexts? I see this, again maybe too simply, as deep concern with methodology—how and what researchers are doing for what reasons, for whom or what. A humanistic approach, she explains,

“means that the premises are rooted in the recognition of the interpretative nature of knowledge, that the display itself is conceived to embody qualitative expressions, and that the information is understood as graphically constituted”.

While I will continue to refrain from defining DH too tightly or narrowly, or too constrained in one epistemology over another, I can say from my readings that DH is not just tools, or rather, DH is not about the uncritical creation and use of tools that don’t care for matters of concern in the tools interaction with contextual and textual networks of relations (person, place, time, and so on).

Text as Data: Mining, Encoding, and Method

For many, data has numerical connotations—in sciences, data begins as quantified observations, which visualization methods are selected to highlight; so what does data mean when as a discipline, texts are what are created as observations? Some scholars wish to distance data in the humanities from data in the sciences because it functions differently. Johanna Drucker compares data versus capta explaining that capta is “taken” actively while data is assumed to be a “given” that is able to be recorded and observed. The difference Drucker sees arising is that humanistic inquiry acknowledges that its knowledge is “situated, partial, and constitutive”—this is the recognition of knowledge as a construction, “not simply given as a natural representation of pre-existing fact” (“Humanities Approaches to Graphical Display”). While Drucker calls for a rethinking of data as capta that better expresses its ambiguity over certainty—which gets at what she describes as interpretative complexity—DH data needs to acknowledge the lens that it is constructing to look at its texts. I will provide a gloss of how data is garnered from humanities texts along with some methods for collecting it: data mining, textual analysis, and differentiated reading. To begin, data in the humanities does not have the same aims as data in the sciences. In the sciences, data is used to attempt to arrive at a truth, but in the humanities data is used to arrive at the questions we desire to ask (Stephen Ramsay, Reading Machines 68). The digital humanities exist to collect and create data, to notice something of interest, to formulate questions, to work to create visualizations that will allow us to better see patterns of interest, and to postulate all over again: data transforms theory, and theory transforms data into interpretation for further theorizing (Cathy Davidson, “Humanities 2.0: promise, perils, Predicitons”). Data is the locus of digital humanities work because it functions as a representation of information, that can be interpreted and reinterpreted, in a manner (or form) that functions as something to communicate, something to interpret, or something to process – it is what research works to articulate, make visible, and let articulate (Alex Poole, “Now is the Future Now?”). Texts are data, or are the potential to become sets of data because they are addressable as a thing of many parts; one can query a position within the text at a certain level of abstraction, like that of a character, word, phrase, line, etc. (Michael Witmore, “Text: A Massively Addressable Object”). Data are created from text based on what is of interest to look for, and are data as singular texts and as large corpora of texts, the size of the base relating to what is of interest t the researcher. Texts are used to create textbases from which various methods and tools might be implemented to draw out patterns and examine trends, which serve as catalyst for further questioning and research (Lang and Baehr, “Data Mining”). Text bases are “coherent collection of semi or unstructured digital documents” that “can come from any written discourse” which “all cohere in some manner”, existing as a corpora of documents assembled around a specific unifying principle that is either thematically or generically similar (Cooney et al.”The Notion of the Textbase”). A data set might come from a collection of romantic poetry in which the researcher wishes to see what adjectives are used to describe love—those adjectives from that set of texts would be the data. This data is created through a process called data or text mining. Text (or data) mining is, to put simply, is knowledge discovery; creating data to apply DH tools to is not just a step in a process toward results, but a practice of curiosity and inquiry in what might result—this process can be exploratory, descriptive, and even predictive in its function (Lang and Baehr, “Data Mining”, 176-179).

Data mining works in this way: first, the features of the text that are of interest to examine are determined, this is often done as texts are collected because something of interest surfaces to help determine how the texts will be read; the texts are then rendered into plain text format so that they can be read as their text only (this means they are stripped of style and layout in their publications); a number of tools can be used to perform mining on texts, but what is determined is what parts will be looked for as differentiated from other parts of the text—this means that a script must be created that informs the tool (computer program) how and what to read. Directives for reading the texts (for their particles, their pronouns, their adjectives, etc.) are created along with lists of words to not get hung up on, more commonly referred to as stop words. An algorithm is run on the texts to bring to the surface words that fit the criteria set and to disclude words that do not. What results will vary widely based on topic of inquiry, tool being used, and how the inquiry was framed. Distant reading and text analysis are not, to my understanding, different form data or text mining conceptually. All of these terms work to describe a shift in the scale at which we read texts—a move from closely reading one text from beginning to end as a whole, to ways of reading parts of a text or collection of texts that can shift scale on interaction based on what is driving inquiry.  If I had to define each of these concepts to better frame this concept of texts as data, I would explain distant reading as a concept that drives DH work—a way of looking at texts differently, with distance or changes in scale, that seeks patterns that can become visible with this differentiated reading. Data or text mining is more procedural in that it is what is done to texts so that they can become data sets for inquiry. Text analysis is like text mining in that it is looking at parts of a text, however, instead of singular terms, text analysis is interested in concordance or the association or proximity of terms (Geoffrey Rockwell, “What is Text Analysis Really?”).

Visualizing Data: Making Visible the Form of Content

Visualizations are as diverse and varied as the imagination can conjure—pie charts, line graphs, scatter plots, bar graphs, word clouds, bubble diagrams, maps, and so on. Visualizations are what are created from the data to represent findings. While I can’t account for all of the types of visualizations (but will include links to some resources lists at the end of this review), I would like to discuss what thinking goes into creating a visual of textual data. Visuals can work by: offering a visual analogy, providing a visual image of non visible phenomena, and providing visual conventions to structure operations (Johanna Drucker Graphesis, 5). Because visualizations are articulations of patterns found in research, their design, or their form matters a great deal. A visualization brings attention to patterns, and thus needs accommodate a mix of evidence and argumentation. Visualizations are intended to show patterns of interest and should  provide a way to find possible patterns to investigate and a means to locate those patterns against a baseline provided by a set of other relevant works (Radizkowsa et al. “Information Visualization for Humanities Scholars”).

In deciding what type of visualization should be sued, experimentation with forms is encouraged to develop a perspective in which to situate the interpretations (Radzikowsa et al). Tanya Clement, quoting computer scientist Ben Schneiderman, describes visualizations as providing “a window into research results but have an inherent limitation of space that results in the ‘occlusion of data, disorientation, and misinterpretation’” (“Text Analysis, Data Mining, and Visualizations in Literary Scholarship”). Visualizations are digital humanities’ lens to see its text differently and at different scales.

Scales of Seeing

Scale is a matter of concern in creating visualizations of data because it determines what is seen. Franco Moretti, in his oft cited work “Graphs, Maps, Trees: Abstract Models of Literary History” (which based on frequency of citation might be a seminal work in digital humanities), coined the term distant reading, which is a difference in scale in how texts are “Read” or encountered from a distance in a large corpora with the assistance of computational tools. He describes the significance of scale:

“What do literary maps do … First, they are a good way to prepare a text for analysis.  You choose a unit–walks, lawsuits, luxury goods, whatever–find its occurrences, place them in space … or in other words: you reduce the text to a few elements, and abstract them from the narrative flow, and construct a new, artificial object like the maps that I have been discussing.  And with a little luck, these maps will be more than the sum of their parts: they will possess ‘emerging’ qualities, which were not visible at the lower level” (53).

Differential reading, or reading at scales, defamiliarize texts, making them unrecognizable in a way (putting them at a distance or oppositely at a proximity) that helps identify features otherwise unseen, to make hypotheses, generate questions, and figure out patterns and how to read them (Clement, “Text Analysis, Data Mining, and Visualizations in Literary Scholarship”). Scales of interaction share a common objective: detail in our data – “The objective is much the same: to restore to our field of view precisely that which is right beneath our nose but too ubiquitous to be synthesized in the human mind” (Flanders and Jockers, “A Matter of Scale”). Reading close and distant isn’t necessarily how we think of close and distant as binaries in proximity to an object. Scale more operates along a continuum that shifts as attention to particular parts of texts or across text shifts; which allows us to “pay attention to databases, data flow, data architectures and the human element behind them” (Clement). Scale can take place through the construction of the data set, or textbase, the number of materials in the corpora, or in the tools being used to read the data. This notion of scale as both close and distant, micro and macro, surfaced in how scholars discussed tools as supportive of that kind of shuttling between different levels of scale: seeing patterns, seeing outliers, zooming in and zooming out (Flanders and Jockers, “A Matter of Scale”). “The computer revolutionizes, not because it proposes an alternative to the basic hermeneutical procedure, but because it reimagines that procedure at new scales, with new speeds, and among new sets of conditions” (Stephen Ramsay, Reading Machines, 31)

Critical Considerations

“We need to acknowledge how much the massive computational abilities that have transformed the sciences have also changed our field in ways writ large and small and hold possibilities for far greater transformation in—research, writing, and teaching—that matter most” (Cathy Davidson “Humanities 2.0: Promise, Peril, Predictions”).

Criticisms and cautions in DH work are complex negotiations of context, resources, and cultural value. What emerged from many texts was the notion that DH work is done to advance the goals of humanist scholarship; what differs, though, are the affordances and constraints of working in the digital. Jamie Skye Bianco asks “does DH need an ethical turn?” to which she responds yes because it operates through webs of people, institutions and politics in uneven networks of relation. People and institutions are a part of DH work: they have/n’t access to texts to research, are/n’t represented in texts, have/n’t access to tools for research, and have/n’t access or representation in what is created. Texts are contextual, they are heterogeneous and dynamic; but reading them for their semantic parts and rendering them as visualizations of selected parts that are oft negligent of situating in the whole being can run the risk of de-emphasizing the human element of the humanities. This risk may come from separating the methods of doing DH work (the tools) from the theories that give impetus to the work. This separation of theory and method risks flattening context by not revealing difference; “the constellation of context, affect, and embodiment must remain viably dynamic and collaborative in digital and computational work” (Bianco, “The Digital Humanities Which is Not One”). Because digital and computational work “documents, establishes, and affectively produces an iteration of real worlds” that are “multimodally layered” (Bianco), not losing context (and its embedded elements) becomes matter of concern. The challenge is to shift humanistic study from attention to effects of technology to a humanistically informed theory of making of technology – considerations of affect, the constructivist force of knowledge as observer dependent and emergent (Drucker, “Humanistic Theory and Digital Scholarship”).  Digital work needs to consider the realms of the digital, and the context that are digitized and situated around digital materials, need to be envisioned as “shared knowledge, culture, and semantic content” (Bianco).

Sustainability, Durability, and Curationability of DH

In reading about the tremendous labor that goes into this work—from digitizing and collecting texts in searchable databases with flexible metadata, to inventing and maintaining the tools, to creating and housing projects—I could not help but question who cares for DH and according to what protocols. For work that is necessarily digital, I wondered about the durability, even the lifespan, of such projects. While some work is being done to curate DH research, the uptake is, at this time, thin. Matters of concern in terms of accessibility and availability of data seems of highest priority. Work is being done, from a variety of institutions and organizations, devoted to the preservation and representation of DH research to promote research on texts as cultural artifacts (Cooney et al.”The Notion of the Textbase: Design and Use of Textbases in the Humanities”). The goal is to move beyond issues in just aggregating data toward managing DH content as knowledge (Graban et al.), which requires larger dialogues about access, proprietary rights, the boundaries of technologies, and conflicts between personal and communal interest (Graban et al. “In, Through, and About the Archive: What Digitization (Dis)Allows”). Key issues that affect curation include the size of the data set (digital files of large corpora are tremendous in size), the number of objects to be curated and their complexity, the interventions needed to care for the data, ethical and legal concerns, policies, practices, standards, and economic incentives (Poole, “Now is the Future Now? The Urgency of Digital Curation in the Digital Humanities”). Aside from standards that would need to be set in care of each of these issues, much would have to be created in terms of infrastructure to take on such content—it would need to be flexible, scalable, and economically and technologically sustainable. Interfaces for both human and machine curators would have to be created as standardized for managing this content. Additionally, in order to create such a system and an interface to that system, metadata standards would have to be created and agreed upon for content to be identified/identifiable and retrievable so that it is useful (Poole). While digital content, and some tools and services exist, they are, currently, not necessarily useful or usable (Poole).

Treading Water in DH Flows

This literature review, as to be expected in being a beginner orienting myself in new texts and ideas, barely scratches the surface. Because I’m talking about DH more generally, instead of focusing on any particular tool or method, this work is more like a survey of establishing traces of work to re-immerse myself in, orienting as interest or use dictates. Until then, I take pause on thinking through this handful of sources to establish connections, figures, concepts, ways of doing, and navigating DH not as a singular discipline, but an assemblage of many.


Resources

This is in no way comprehensive (but on the web it can be amended and tended to, maybe in a separate location on my blog). Here is a (small) handful of lists and links to explore DH tools, concepts, and projects.

Glossary of terms from MLA Commons by Daniel Powell, Constance Crompton and Ray Siemens

DH tools for beginners 

Getting Started in the Digital Humanities from Lisa Spiro – executive director of Digital Scholarship Services at Rice University’s Fondren Library

CUNY Digital Humanities Resource Guide

HASTAC (Humanities, Arts, Sciences, and Technology Alliance and Collaboratory) Digital Humanities Resource Guide

Digital Humanities at Princeton Resource Guide

Digital Humanities Research Tools and Resources Guide by University of Illinois Urbana-Champaign

Digital Humanities tool list built by Alan Liu

Tutorials for DH Tools and Methods list built by Alan Liu

Journal of Digital Humanities -“comprehensive, peer-reviewed, open access journal that features the best scholarship, tools, and conversations produced by the digital humanities community”

The International Directory of Digital Humanities Centers

 

Bibliography

In conversations in our Rhetoric, Composition and Digital Humanities seminar with Collin Gifford Brooke, CGB has described an interest in creating bibliographies that assign and visualize weight in the use of texts in a written work. I rather liked this idea and attempted to make visible how much I used each text in my bibliography. Texts used the least are in 12 point font (one citation), while texts used the most are in 24 point font (five citations) —with a range in between to capture the distributed attention the text received.

 Bianco, Jamie “Skye”. “This Digital Humanities Which Is Not One.” Debates in the Digital Humanities. By Matthew K.. Gold. Minneapolis: Univ Of Minnesota, 2012. Print.

 

Clement, Tanya. “Text Analysis, Data Mining, and Visualizations in Literary Scholarship.Literary Studies in the Digital Age. MLA Commons. Web.

 

Cooney, Charles, Mark Olsen, and Glenn Roe. “The Notion of the Texbase: Design and Use of Textbases in the Humanities.” Literary Studies in the Digital Age. MLA Commons. Web.

 

Davidson, Cathy N. “Humanities 2.0: Promise, Perils, Predictions.” Debates in the Digital Humanities. By Matthew K.. Gold. Minneapolis: Univ Of Minnesota, 2012. Print.

 

Drucker, Johanna. “Graphesis: Visual Knowledge Production and Representation”. Poetess Archive Journal 2.1 (2010): 1-50. Web.

 

Drucker, Johanna. “Humanities Approaches to Graphical Display.” Digital Humanities Quarterly 5.1 (2011). Web.

 

Drucker, Johanna. “Humanistic Theory and Digital Scholarship.” Debates in the Digital Humanities. Minneapolis: Univ Of Minnesota, 2012. Print.

 

Graban Tarez Samra, Alexis Ramsey-Tobienne and Whitney Myers. “In, Through, and About the Archive: What Digitization (Dis)Allows”. Rhetoric and the Digital Humanities (forthcoming).

 

Jockers, Matthew L. and Julia Flanders. “A Matter of Scale”. UNL Digital Commons. Web.

 

Lang, Susan and Craig Baehr. “Data Mining: A Hybrid Methodology for Complex and Dynamic Research”. College Composition and Communication 64:1 (2012): 172-194.

 

Moretti, Franco. “Graphs: Maps, Graphs, Trees: Abstract Models for Literary History”. New Left Review: 28 (2003): 67-93.

 

Poole, Alex H. “Now is the Future Now? The Urgency of Digital Curation in the Digital Humanities.” Digital Humanities Quarterly 7.2 (2013). 30 Jan. 2014. Web.

 

Radzikowsa, Milena, Stan Ruecker, and Stéfan Sinclair. “Information Visualization for Humanities Scholars.” Literary Studies in the Digital Age. MLA Commons. Web.

 

Ramsay, Stephen. Reading Machines: Toward an Algorithmic Criticism. Urbana: University of Illinois Press, 2011. Print.

 

Rockwell, Geoffrey. “What is Text Analysis, Really?” LLC 18.2 (2003): 209-219.

 

Witmore, Michael. “Text: A Massively Addressable Object.” Debates in the Digital Humanities. Minneapolis: Univ Of Minnesota, 2012. Print.

 

Advertisements

Methodologies of Ecology

Ecological Methodologies for Composing

Environments are not just containers, but are processes that change the content totally.” Marshall McLuhan

 Living Material

In Unframing Models of Public Discourse, Jenny Rice works to destabilize the frame of the rhetorical situation by proposing that situations operate in more ecologically complex networks of “lived practical consciousness or structures of feelings” (7) – not a move to dissolve all boundaries for looking at the context of a situation, but destabilizing them enough to account for the limitations in too discrete of borders. Rhetoric is better understood and practiced (enacted) as the complex art that we describe it to be. Ecological metaphors for conceptualizing the growing and evolving structure of the field of rhetoric and composition are not new. The work of Marilyn Cooper, Jenny Edbauer Rice, [fill in others] have worked to move beyond metaphors for ways of accounting for rhetoric that leaks beyond the static rhetorical situation (Lloyd Bitzer’s rhetorical situation,1968, is an iconic and valuable text in the field that examined rhetorical discourse as called into existence by a situation. Despite its age, it is still of use. However, I am interested in more complex models to better articulate discourse as unfolding through media – in space/place, time, and power to invoke through the connectivity of web linking and metadata) into the complexly articulated social. While I draw from these works to serve as theory for my own project, I take interest in employing ecology as methodology for engaging texts. Ecology as methodology seeks to take the structure and nature of ecologies and enact them as ways of doing, or methodology. This isn’t a singular methodology, or a direct translation from ecology as theory to ecology as method, but an attempt to render this work visible. If ecology as theory is interested in dynamic models of discourse that seek to articulate the complexity of what comprises a text (and we account for text in an expanded notion of article, book, dialogue, image, graphic, and so on) with an emphasis on the material assemblage, then ecology as methodology is means of accounting for the materials that compose these dynamic models of text. Methodology as ecology seeks to make invisible intermediaries that compose text visible, accountable, and articulable. Accounting for the articulation of materials may allow for the articulation of interaction with texts based on the collection of materials individually, and in highlighting different relationship between the materials. Through employing distant reading methods, data mining, visualization, and pattern recognition as methodologies of ecology, represented as tagclouds, graphs, timelines, and maps, I will create ecological representations of Marilyn Cooper’s “The Ecology of Writing” (1986) as a means of making visible the ecological nature of text.  “The Ecology of Writing” was chosen as the textual sample because it is arguably the piece of scholarship that introduced ecologies into rhetoric and composition – other pieces referenced here can be traced back to Marilyn Cooper’s text through bibliometric and citation information. Additionally, exploring the creation of ecologies of “Ecology” is meant to be playful. Cooper’s text is a response the cognitive process model dominant in the field at the time, which framed writing as entirely produced in the head of the author. Cooper’s critique was that such a model idealized the notion of a solitary author, which isolated the author from the social world; she remarked that “Such changes in writing pedagogy indicate that the perspective allowed by the dominant model has again become too confining” (366). Language, to Cooper, was social; ushering in a social turn, Cooper proposed an ecological model of writing “whose fundamental tenet is that writing is an activity through which a person is continually engaged with a variety of socially constituted systems” (367). Cooper distributed the material constituents of writing in more complex social systems comprised of characteristics other writers and writings;

“Writing, thus, is seen to be both constituted by and constitutive of these ever-changing systems, systems through which people relate as complete, social beings, rather than imaging each other as remote images: an author, an audience” (373).

To further ecological work, I take up the task of making models of systems (what constitutes materials and how they connect) to better understand ecological relationships.

In 2005, the National Humanities Center granted the Richard W. Lyman Award, a recognition of scholars who have advanced the humanities through the use of information technology, to John Unsworth, Vice Provost, University Librarian, Chief Information Officer, and Professor of English at Brandeis University. In his lecture, Unsworth made a call to action for more work to be done in the humanities that employs text-mining, data-mining, visualization, modeling, and pattern recognition in a large corpus of texts with “the goal of data-mining (including text-mining) is to produce new knowledge by exposing similarities or differences, clustering or dispersal, co-occurrence and trends.” Unsworth described the work of the NORA Project, MONK (metadata offer new knowledge), and WordHoard. The NORA project was “a multi-institutional, multi-disciplinary Mellon-funded project to apply text mining and visualization techniques to large humanities-oriented digital library collections. The goal of the Nora project was to produce software for discovering, visualizing, and exploring significant patterns across large collections of full-text humanities resources in existing digital libraries.” MONK “is a digital environment designed to help humanities scholars discover and analyze patterns in the texts they study”; all code from the project is made available as open source. MONK describes the ongoing value in this mining and visualizing work by stating that

“the scholarly use of digital texts must progress beyond treating them as book surrogates and move towards the exploration of the potential that emerges when you put many texts in a single environment that allows a variety of analytical routines to be executed across some or all of them.”

Like MONK, WordHoard is open source as a program that can be downloaded. WordHoard applies corpus linguistics to a collection of texts to assign tags “according to morphological, lexical, prosodic, and narratological criteria”; the project explains that “Deeply tagged corpora of course support more finely grained inquiries at a verbal or stylistic level. But more importantly, access to the words of a text at such microscopic levels also lets you look in new ways at the imaginative worlds created by those words.” While these projects are of a much larger scale (and with many more pairs of eyes), similar data mining and visualizing methods can be applied to create an ecological reading on a much smaller (and budget minimalistic) scale.

Ecological Reading as Invention: Method

In The Future of Invention, John Muckelbauer states that it is

“not always the case that an inventive inquiry works best when it responds to a problem by seeking its solution, or responds to the question with an answer; it may be well that my turning away from the question, we can uncover a different kind of trajectory, even a different kind of relentless directness with which to engage the problem.” (149-150)

 As a newcomer to data mining, and one who is primarily working through trial and error without the aid of purchased software, much of this work is scrappy – making do within limitations of what I have — reading is done by eye, but at an alteration of scale. The emphasis, whether or not one is using software, is to alter the reading of a text by changing the scale, or trajectory in Muckelbauer’s words, of how the text communicates – drawing out keywords and phrases. To begin, after locating the full text PDF of the article through the JSTOR database, I used the note taking application, Evernote, to function as an OCR (optical character recognition) “software.” Creating a new note, I dragged and dropped the article in the space. The text is then copied and pasted into a text edit program to render it as plain text. This plain text is used to generate the tagclouds at Tagcrowd.com. Parameters can be set within the site to ignore stop words, to list frequency values, and to control how many keywords are represented in a cloud. Reading across the three clouds, I also created a list of the ten most frequently used words in the article, which served as the basis for an infographic. Ten most frequently used words with their numerical frequency in the article:

  • Audience 37
  • Model 34
  • Process 27
  • Systems 26
  • Ideas 24
  • Forms 24
  • Purposes 22
  • Texts 22
  • Interactions 17
  • Structures 16

To create the data for the other visualizations, the computer’s search feature was used to locate keywords, which were then tallied in an inventory list. I searched for “Cooper,” “ecology,” and “ecological” within each of the articles that I collected that cited “Ecology” in their bibliography. I used these keywords to locate what was quoted and summarized of Cooper’s article as well to generate data of what elements of her text are being referenced. I located 14 quotes from her article and examined which pieces each article used in in-text citations. The use of quotations served as data to construct two infographics based on citation patterns. The last data searched NCTE (National Council of Teachers of English) database for articles that cited Cooper’s article, which accounted for the following:

  • Aviva Freedman “Show and Tell? The Role of Explicit Teaching in the Learning of New Genres” – Research in the Teaching of English 1993
  • Amber Buck “Examining Digital Literacy Practices on Social Network Sites” –Research in the Teaching of English 2012
  • Bruce Horner “Students, Authorship, and the Work of Composition” – College English 1997
  • Anish M. Dave and David R. Russell “Drafting and Revision Using Word Processing by Undergraduate Student Writers: Changing Conceptions and Practices” – Research in the Teaching of English 2010
  • Sidney I. Dobrin and Christian R. Weisser “Breaking Ground in Ecocomposition: Exploring Relationships between Discourse and Environment” – College English 2002
  • Jeremiah Dyehouse, Michael Pennell, and Lida K. Shamoon “Writing in Electronic Environments”: A concept and a Course for the Writing and Rhetoric Major – CCC 2009
  • Richard C. Freed and Gleen J. Broadhead “Discourse Communities, Sacred Texts, and Institutional Norms” – CCC 1987
  • Lester Faigley “Competing Theories of Process: A Critique and a Proposal” – College English 1986
  • Judy Kirscht, Rhonda Levine, and John Reiff “Evolving Paradigms: WAC and the Rhetoric of Inquiry” – CCC 1994
  • Lucille Parkinson McCarthy “A Stranger in Strange Lands: A College Student Writing Across the Curriculum” – Research in the Teaching of English 1987
  • Matthew Newcomb “Sustainability as a Design Principle for Composition: Situational Creativity as a Habit of Mind” – CCC 2012
  • Richard Fulkerson “Composition Theory in the Eighties: Axiological Consensus and Paradigmatic Diversity” – CCC 1990
  • Nathaniel A. Rivers and Ryan P. Weber “Ecological, Pedagogical, Public Rhetoric” – CCC 2011
  • Kristie S. Fleckenstein, Clay Spinuzzi, Rebecca J. Rickley, and Carole Clark Papper “The Importance of Harmony: An Ecological Metaphor for Writing Research” – CCC 2008
  • Trish Roberts-Miller “Discursive Conflict in Communities and Classrooms” – CCC 2003

These data sets, gathered from altering the scales at which the texts were read, served as the basis for the visualizations. The data focus could have been different for different means, and could be represented in any number of ways; what is of interest in my work a this time is in seeing what becomes visible in pulling out patterns and communicating that information graphically. Based on the patterns I noticed, I decided to construct:

  • a short video on the process of mining text
  • two animated gifs to represent:
    • one that represents the process of mining an article to render plain text to work with
    • a gif of the tagclouds to look at patterns across the clouds [NOTE: there is script that can display tagclouds along an interactive timeline that I have used in the past developed by Chirag Mehta. However, the composition must be hosted on a web domain.]
  • two infographics that represent:
          • the ten most used words in Cooper’s article
          • the most quoted line in publications that cite Cooper’s article
  • two graphs that represent:
    • journals that cite the article
    • the ratio of most quoted lines from Cooper’s article

Data Mining and Visualization

Franco Moretti, to whom the concept of distant reading can be traced, described as the impetus to his work, that “a field this large cannot be understood by stitching together separate bits of knowledge about individual cases, because it isn’t a sum of individual cases: it’s a collective system, that should be grasped as such, as a whole” (Maps, Graphs, Trees 68). To Moretti, graphs provide data handles, not interpretation (72); “What graphs make us see…are the constraints and the inertia of the literary field—the limits of the imaginable.” (82) The scholarship of Derek Mueller works to create visualizations that alter the scales at which we encounter texts. In his work Views from a Distance: A Nephological Model of the CCCC Chairs’ Addresses, 1977-2011 Mueller argues that

“As the scholarly record grows there is an escalating value in realizing connections. This is exceedingly important for newcomers to the field who must make inroads however they can, by conversation and conventional reading and writing, of course, but also by pattern-finding, by nomadically exploring conceptual interplay across abstracts and abstractive variations, and by finding and tracing linkages among materials and ideas, new and old.”

Mueller’s work takes an interest in what Moretti termed “distant reading,” which seeks to gather large textual datasets from which visualizations may be composed. While Moretti’s work focuses on charting entire literary genres, Mueller applies distant reading methods and data visualization more locally to alter the scales at which readers can encounter texts by generating word clouds that turn to data-mining processes to draw the most frequently used terms from full-text versions of the addresses. He notes that a radical reduction occurs in this process, allowing “selected parts to stand out from the thick, ecologically entangled whole.”  These clouds are not written with the goal of summary or even coherence as a cohesive whole, instead attempting to shift what is available to a reader to wonder and wander about. Compositions are not self-contained; their materials are assembled from spaces beyond that of their frame. And while attention is focused in a works cited page—a microscopic view that inhibits the visualization and the tracing of materials that have been assembled, that have been constructed. In Grasping Rhetoric and Composition by Its Long Tail, Mueller describes data mining as a pre-inquiry based methodology, explaining that “these [data mining] methods catalyze questions and begin to provide a means of addressing such questions more systematically than is otherwise available” (200). Visualizing data affords new perspectives and patterns that might otherwise go unnoticed in materials (207). For the discipline, the benefit of quantitative methods is not limited to the noticing of singular or isolated patterns, but makes available “the stabilization of reusable, interoperable, field-wide datasets” (200) – this work is meant to be recreated to reflect its ecological nature. Mueller describes this as “heuretic disciplinography,” the

“writing and rewriting the field by exploring the intersections across different scholars’ bodies of work as well as the associated pedagogical, theoretical, and methodological approaches they mobilize.” (201)

Distant reading can aid in establishing and tracing the structure of scholarship through the relations of its materials to view our scholarship “in relation to the complex and highly distributed processes involved in the production, distribution, and valuation of those products” (Shipka 51) because they are not singular or isolated in composition. In Toward a Composition Made Whole, Jody Shipka argues that

“in requiring that we trace the highly distributed processes associated with the production of texts, the framework also militates against text-dependent conceptions of multimodality by foregrounding the variety of tools, participants, and actions supported (or may even have thwarted) the production of a particular text.” (52)

Johanna Drucker describes data visualization in Graphesis as a “concern with the creation of methods of interpretation that are generative and iterative, capable of producing new knowledge through the aesthetic provocation of graphical expressions” (41). Data visualization is concerned with the affordances for structuring/producing knowledge through graphical form—creating methods of interpretation that are generative in their expressive provocation. These visualizations are dynamic in that they are models of generating new knowledge through visual means, not a re-presentation of knowledge that already exists—a way of seeing what is usually unseen. These are knowledge in the making. Data visualization of the field is not writing histories or counter-histories to remember back or progress forward, but in making visible what is both available and unavailable to us in terms of materiality. Johanna Drucker explains “Our ideas of what something should be—a house, an airplane, an automobile—constrains our ability to design these things within an abstract model. Breakthroughs in knowledge come from changing the model, or by innovative expressions.”  Our reading of texts is based on traditions of reading texts in a certain manner, at a certain distance or proximity; “How we know what we know is predicated on the models of knowing that mediate our experience by providing conceptual schema or processing experience into form” (15); what we compose is in part informed by how we have previously done composition, a process that makes available certain favored materials while excluding others as unavailable, or unnoticed. She goes on to argue that

“Graphic schema create syntactic structures within which semantic values can be assigned and maintained. We can read the organizing syntax of these graphic structures. The structured relations among information elements is as much an expression of a way of thinking as any other intellectual form. To put it another way, graphical structures are rhetorical arguments.” (17)

Creating word and sentence level data sets are “concerned with the creation of methods of interpretation that are generative and iterative, capable of producing new knowledge through the aesthetic provocation of graphical expressions” (Drucker 41). Like Drucker’s work, an ecological reading takes concern with “the creation of methods of interpretation that are generative and iterative, capable of producing new knowledge through the aesthetic provocation of graphical expressions” (41). These ecological visualizations are premised on the idea that an image, like a text, is an aesthetic provocation, a field of potentialities, in which a viewer intervenes. Knowledge is not transferred, revealed, or perceived, but is created through a dynamic process (36).

Data Terrariums: Ecological Readings 

Let your ears and eyes wonder/wander across text at differing scales and representations. What do you notice? First, think of how you’ve engaged with the text up until this point – at what distance to you read it? Up close? Across sections? What might become visible in reading the following?

 

Mining an article to render plain text for data visualization work

Mining an article to render plain text for data visualization work

Frequently Quoted Lines and Keywords

Tagcloud renderings on the most frequently used words in the article at groups of 15, 25, and 50 words

Tagcloud renderings on the most frequently used words in the article at groups of 15, 25, and 50 words

Journal Citations

Space to Grow

These are only a few ways of reading ecologies based on materials and connections I chose to make visible – pulling them out of the rest. The same data sets I created can be represented differently, or, the data set can be expanded. What is visible is different than reading blocks of alphanumeric texts in a journal format. What can/can’t be noticed is different. Reading Marilyn Cooper’s “The Ecology of Writing” would surely draw attention to certain materials – keywords, citations – but there is something different in allowing these materials to shift themselves from the confounds of the rectangular field. The are able to move between relationships. What can be noticed is ecologically evocative – one reader is likely to notice something differently from another. One word had greater resonance, one relationship makes a line of thinking more available; or, one might notice what isn’t present that we might have taken for granted or forgotten. These should not be the only representations; they should not be let to settle and decompose. Ecologies are dynamic and need relationships across materials to persist. In exploring what ecological methodologies make available conceptually from materials we exist within, comes the necessity to raise as matter of concern that such materials must be rendered visible for conceptual use. In order to construct ecologies as methodology, data is necessary. Much of our data is text based scholarship that exists in an expanse of spaces in paper and digital print. Some is accessible only with membership to institutions or associations, some is open to circulate via Twitter, blogs, and the passing on of tags and links, others are only available through the mail or library reserves. We exist in an abundance of materials, but much of it isn’t available as data, or to construct data for reworking. Data sets from textual corpora can be mined, rendered, and shared for different eyes to look over and notice and collaborate with. Compositions can be more mindful of one another, and the materials assembled to make them possible.