As someone interested in visualizations of information and composing image texts, I have been thinking about what I would create to illustrate (make visible) the cancer that consumed/s the women in my family. It seems morbid, or at least uncomfortable, to want to depict the disease without emphasizing narratives of overcoming or resilience, that letting it be seen as it is disembodies the bodies that have nurtured it. I have watched videos of surgeries on women that exist only as torsos or of cartoon monster cells sneaking throughout the body, and images that are illustrations of tumors forcing tissue into distorted asymmetries and photographs that look like alien fruit. I can see my own diagnosis as typeface and an exercise of balance and white space on the page, as calendar tickmarks taking inventory of days and anomalies in patterns of pain, and as Rorschach bloodblots that I am too fearful to interpret. I could show my family tree with attention drawn to deep bark carved, extending back, to the bough my mother and I now share. I could show each type of cancer with its corresponding woman/body: breast ___________, ovarian _____________, uterine ______________, cervical _________________. Not to forget the nodal tissues connected to these networks of disease as they thrived and spread: pectoralis major, kidney, colon, liver, fallopian tubes—trace the intractions. I could create charts that depict the age of diagnosis, comparisons of treatment undergone, or the duration of the disease. Or perhaps an archive of the women (of which I am living materiality), or poems and paintings of the affective dimensions of the rhetorics of silence and pain and disembodiment. Of strength and resilience. Or faces of women I love.
Ecological Methodologies for Composing
“Environments are not just containers, but are processes that change the content totally.” Marshall McLuhan
In Unframing Models of Public Discourse, Jenny Rice works to destabilize the frame of the rhetorical situation by proposing that situations operate in more ecologically complex networks of “lived practical consciousness or structures of feelings” (7) – not a move to dissolve all boundaries for looking at the context of a situation, but destabilizing them enough to account for the limitations in too discrete of borders. Rhetoric is better understood and practiced (enacted) as the complex art that we describe it to be. Ecological metaphors for conceptualizing the growing and evolving structure of the field of rhetoric and composition are not new. The work of Marilyn Cooper, Jenny Edbauer Rice, [fill in others] have worked to move beyond metaphors for ways of accounting for rhetoric that leaks beyond the static rhetorical situation (Lloyd Bitzer’s rhetorical situation,1968, is an iconic and valuable text in the field that examined rhetorical discourse as called into existence by a situation. Despite its age, it is still of use. However, I am interested in more complex models to better articulate discourse as unfolding through media – in space/place, time, and power to invoke through the connectivity of web linking and metadata) into the complexly articulated social. While I draw from these works to serve as theory for my own project, I take interest in employing ecology as methodology for engaging texts. Ecology as methodology seeks to take the structure and nature of ecologies and enact them as ways of doing, or methodology. This isn’t a singular methodology, or a direct translation from ecology as theory to ecology as method, but an attempt to render this work visible. If ecology as theory is interested in dynamic models of discourse that seek to articulate the complexity of what comprises a text (and we account for text in an expanded notion of article, book, dialogue, image, graphic, and so on) with an emphasis on the material assemblage, then ecology as methodology is means of accounting for the materials that compose these dynamic models of text. Methodology as ecology seeks to make invisible intermediaries that compose text visible, accountable, and articulable. Accounting for the articulation of materials may allow for the articulation of interaction with texts based on the collection of materials individually, and in highlighting different relationship between the materials. Through employing distant reading methods, data mining, visualization, and pattern recognition as methodologies of ecology, represented as tagclouds, graphs, timelines, and maps, I will create ecological representations of Marilyn Cooper’s “The Ecology of Writing” (1986) as a means of making visible the ecological nature of text. “The Ecology of Writing” was chosen as the textual sample because it is arguably the piece of scholarship that introduced ecologies into rhetoric and composition – other pieces referenced here can be traced back to Marilyn Cooper’s text through bibliometric and citation information. Additionally, exploring the creation of ecologies of “Ecology” is meant to be playful. Cooper’s text is a response the cognitive process model dominant in the field at the time, which framed writing as entirely produced in the head of the author. Cooper’s critique was that such a model idealized the notion of a solitary author, which isolated the author from the social world; she remarked that “Such changes in writing pedagogy indicate that the perspective allowed by the dominant model has again become too confining” (366). Language, to Cooper, was social; ushering in a social turn, Cooper proposed an ecological model of writing “whose fundamental tenet is that writing is an activity through which a person is continually engaged with a variety of socially constituted systems” (367). Cooper distributed the material constituents of writing in more complex social systems comprised of characteristics other writers and writings;
“Writing, thus, is seen to be both constituted by and constitutive of these ever-changing systems, systems through which people relate as complete, social beings, rather than imaging each other as remote images: an author, an audience” (373).
To further ecological work, I take up the task of making models of systems (what constitutes materials and how they connect) to better understand ecological relationships.
In 2005, the National Humanities Center granted the Richard W. Lyman Award, a recognition of scholars who have advanced the humanities through the use of information technology, to John Unsworth, Vice Provost, University Librarian, Chief Information Officer, and Professor of English at Brandeis University. In his lecture, Unsworth made a call to action for more work to be done in the humanities that employs text-mining, data-mining, visualization, modeling, and pattern recognition in a large corpus of texts with “the goal of data-mining (including text-mining) is to produce new knowledge by exposing similarities or differences, clustering or dispersal, co-occurrence and trends.” Unsworth described the work of the NORA Project, MONK (metadata offer new knowledge), and WordHoard. The NORA project was “a multi-institutional, multi-disciplinary Mellon-funded project to apply text mining and visualization techniques to large humanities-oriented digital library collections. The goal of the Nora project was to produce software for discovering, visualizing, and exploring significant patterns across large collections of full-text humanities resources in existing digital libraries.” MONK “is a digital environment designed to help humanities scholars discover and analyze patterns in the texts they study”; all code from the project is made available as open source. MONK describes the ongoing value in this mining and visualizing work by stating that
“the scholarly use of digital texts must progress beyond treating them as book surrogates and move towards the exploration of the potential that emerges when you put many texts in a single environment that allows a variety of analytical routines to be executed across some or all of them.”
Like MONK, WordHoard is open source as a program that can be downloaded. WordHoard applies corpus linguistics to a collection of texts to assign tags “according to morphological, lexical, prosodic, and narratological criteria”; the project explains that “Deeply tagged corpora of course support more finely grained inquiries at a verbal or stylistic level. But more importantly, access to the words of a text at such microscopic levels also lets you look in new ways at the imaginative worlds created by those words.” While these projects are of a much larger scale (and with many more pairs of eyes), similar data mining and visualizing methods can be applied to create an ecological reading on a much smaller (and budget minimalistic) scale.
Ecological Reading as Invention: Method
In The Future of Invention, John Muckelbauer states that it is
“not always the case that an inventive inquiry works best when it responds to a problem by seeking its solution, or responds to the question with an answer; it may be well that my turning away from the question, we can uncover a different kind of trajectory, even a different kind of relentless directness with which to engage the problem.” (149-150)
As a newcomer to data mining, and one who is primarily working through trial and error without the aid of purchased software, much of this work is scrappy – making do within limitations of what I have — reading is done by eye, but at an alteration of scale. The emphasis, whether or not one is using software, is to alter the reading of a text by changing the scale, or trajectory in Muckelbauer’s words, of how the text communicates – drawing out keywords and phrases. To begin, after locating the full text PDF of the article through the JSTOR database, I used the note taking application, Evernote, to function as an OCR (optical character recognition) “software.” Creating a new note, I dragged and dropped the article in the space. The text is then copied and pasted into a text edit program to render it as plain text. This plain text is used to generate the tagclouds at Tagcrowd.com. Parameters can be set within the site to ignore stop words, to list frequency values, and to control how many keywords are represented in a cloud. Reading across the three clouds, I also created a list of the ten most frequently used words in the article, which served as the basis for an infographic. Ten most frequently used words with their numerical frequency in the article:
- Audience 37
- Model 34
- Process 27
- Systems 26
- Ideas 24
- Forms 24
- Purposes 22
- Texts 22
- Interactions 17
- Structures 16
To create the data for the other visualizations, the computer’s search feature was used to locate keywords, which were then tallied in an inventory list. I searched for “Cooper,” “ecology,” and “ecological” within each of the articles that I collected that cited “Ecology” in their bibliography. I used these keywords to locate what was quoted and summarized of Cooper’s article as well to generate data of what elements of her text are being referenced. I located 14 quotes from her article and examined which pieces each article used in in-text citations. The use of quotations served as data to construct two infographics based on citation patterns. The last data searched NCTE (National Council of Teachers of English) database for articles that cited Cooper’s article, which accounted for the following:
- Aviva Freedman “Show and Tell? The Role of Explicit Teaching in the Learning of New Genres” – Research in the Teaching of English 1993
- Amber Buck “Examining Digital Literacy Practices on Social Network Sites” –Research in the Teaching of English 2012
- Bruce Horner “Students, Authorship, and the Work of Composition” – College English 1997
- Anish M. Dave and David R. Russell “Drafting and Revision Using Word Processing by Undergraduate Student Writers: Changing Conceptions and Practices” – Research in the Teaching of English 2010
- Sidney I. Dobrin and Christian R. Weisser “Breaking Ground in Ecocomposition: Exploring Relationships between Discourse and Environment” – College English 2002
- Jeremiah Dyehouse, Michael Pennell, and Lida K. Shamoon “Writing in Electronic Environments”: A concept and a Course for the Writing and Rhetoric Major – CCC 2009
- Richard C. Freed and Gleen J. Broadhead “Discourse Communities, Sacred Texts, and Institutional Norms” – CCC 1987
- Lester Faigley “Competing Theories of Process: A Critique and a Proposal” – College English 1986
- Judy Kirscht, Rhonda Levine, and John Reiff “Evolving Paradigms: WAC and the Rhetoric of Inquiry” – CCC 1994
- Lucille Parkinson McCarthy “A Stranger in Strange Lands: A College Student Writing Across the Curriculum” – Research in the Teaching of English 1987
- Matthew Newcomb “Sustainability as a Design Principle for Composition: Situational Creativity as a Habit of Mind” – CCC 2012
- Richard Fulkerson “Composition Theory in the Eighties: Axiological Consensus and Paradigmatic Diversity” – CCC 1990
- Nathaniel A. Rivers and Ryan P. Weber “Ecological, Pedagogical, Public Rhetoric” – CCC 2011
- Kristie S. Fleckenstein, Clay Spinuzzi, Rebecca J. Rickley, and Carole Clark Papper “The Importance of Harmony: An Ecological Metaphor for Writing Research” – CCC 2008
- Trish Roberts-Miller “Discursive Conflict in Communities and Classrooms” – CCC 2003
These data sets, gathered from altering the scales at which the texts were read, served as the basis for the visualizations. The data focus could have been different for different means, and could be represented in any number of ways; what is of interest in my work a this time is in seeing what becomes visible in pulling out patterns and communicating that information graphically. Based on the patterns I noticed, I decided to construct:
- a short video on the process of mining text
- two animated gifs to represent:
- one that represents the process of mining an article to render plain text to work with
- a gif of the tagclouds to look at patterns across the clouds [NOTE: there is script that can display tagclouds along an interactive timeline that I have used in the past developed by Chirag Mehta. However, the composition must be hosted on a web domain.]
- two infographics that represent:
- the ten most used words in Cooper’s article
- the most quoted line in publications that cite Cooper’s article
- two graphs that represent:
- journals that cite the article
- the ratio of most quoted lines from Cooper’s article
Data Mining and Visualization
Franco Moretti, to whom the concept of distant reading can be traced, described as the impetus to his work, that “a field this large cannot be understood by stitching together separate bits of knowledge about individual cases, because it isn’t a sum of individual cases: it’s a collective system, that should be grasped as such, as a whole” (Maps, Graphs, Trees 68). To Moretti, graphs provide data handles, not interpretation (72); “What graphs make us see…are the constraints and the inertia of the literary field—the limits of the imaginable.” (82) The scholarship of Derek Mueller works to create visualizations that alter the scales at which we encounter texts. In his work Views from a Distance: A Nephological Model of the CCCC Chairs’ Addresses, 1977-2011 Mueller argues that
“As the scholarly record grows there is an escalating value in realizing connections. This is exceedingly important for newcomers to the field who must make inroads however they can, by conversation and conventional reading and writing, of course, but also by pattern-finding, by nomadically exploring conceptual interplay across abstracts and abstractive variations, and by finding and tracing linkages among materials and ideas, new and old.”
Mueller’s work takes an interest in what Moretti termed “distant reading,” which seeks to gather large textual datasets from which visualizations may be composed. While Moretti’s work focuses on charting entire literary genres, Mueller applies distant reading methods and data visualization more locally to alter the scales at which readers can encounter texts by generating word clouds that turn to data-mining processes to draw the most frequently used terms from full-text versions of the addresses. He notes that a radical reduction occurs in this process, allowing “selected parts to stand out from the thick, ecologically entangled whole.” These clouds are not written with the goal of summary or even coherence as a cohesive whole, instead attempting to shift what is available to a reader to wonder and wander about. Compositions are not self-contained; their materials are assembled from spaces beyond that of their frame. And while attention is focused in a works cited page—a microscopic view that inhibits the visualization and the tracing of materials that have been assembled, that have been constructed. In Grasping Rhetoric and Composition by Its Long Tail, Mueller describes data mining as a pre-inquiry based methodology, explaining that “these [data mining] methods catalyze questions and begin to provide a means of addressing such questions more systematically than is otherwise available” (200). Visualizing data affords new perspectives and patterns that might otherwise go unnoticed in materials (207). For the discipline, the benefit of quantitative methods is not limited to the noticing of singular or isolated patterns, but makes available “the stabilization of reusable, interoperable, field-wide datasets” (200) – this work is meant to be recreated to reflect its ecological nature. Mueller describes this as “heuretic disciplinography,” the
“writing and rewriting the field by exploring the intersections across different scholars’ bodies of work as well as the associated pedagogical, theoretical, and methodological approaches they mobilize.” (201)
Distant reading can aid in establishing and tracing the structure of scholarship through the relations of its materials to view our scholarship “in relation to the complex and highly distributed processes involved in the production, distribution, and valuation of those products” (Shipka 51) because they are not singular or isolated in composition. In Toward a Composition Made Whole, Jody Shipka argues that
“in requiring that we trace the highly distributed processes associated with the production of texts, the framework also militates against text-dependent conceptions of multimodality by foregrounding the variety of tools, participants, and actions supported (or may even have thwarted) the production of a particular text.” (52)
Johanna Drucker describes data visualization in Graphesis as a “concern with the creation of methods of interpretation that are generative and iterative, capable of producing new knowledge through the aesthetic provocation of graphical expressions” (41). Data visualization is concerned with the affordances for structuring/producing knowledge through graphical form—creating methods of interpretation that are generative in their expressive provocation. These visualizations are dynamic in that they are models of generating new knowledge through visual means, not a re-presentation of knowledge that already exists—a way of seeing what is usually unseen. These are knowledge in the making. Data visualization of the field is not writing histories or counter-histories to remember back or progress forward, but in making visible what is both available and unavailable to us in terms of materiality. Johanna Drucker explains “Our ideas of what something should be—a house, an airplane, an automobile—constrains our ability to design these things within an abstract model. Breakthroughs in knowledge come from changing the model, or by innovative expressions.” Our reading of texts is based on traditions of reading texts in a certain manner, at a certain distance or proximity; “How we know what we know is predicated on the models of knowing that mediate our experience by providing conceptual schema or processing experience into form” (15); what we compose is in part informed by how we have previously done composition, a process that makes available certain favored materials while excluding others as unavailable, or unnoticed. She goes on to argue that
“Graphic schema create syntactic structures within which semantic values can be assigned and maintained. We can read the organizing syntax of these graphic structures. The structured relations among information elements is as much an expression of a way of thinking as any other intellectual form. To put it another way, graphical structures are rhetorical arguments.” (17)
Creating word and sentence level data sets are “concerned with the creation of methods of interpretation that are generative and iterative, capable of producing new knowledge through the aesthetic provocation of graphical expressions” (Drucker 41). Like Drucker’s work, an ecological reading takes concern with “the creation of methods of interpretation that are generative and iterative, capable of producing new knowledge through the aesthetic provocation of graphical expressions” (41). These ecological visualizations are premised on the idea that an image, like a text, is an aesthetic provocation, a field of potentialities, in which a viewer intervenes. Knowledge is not transferred, revealed, or perceived, but is created through a dynamic process (36).
Data Terrariums: Ecological Readings
Let your ears and eyes wonder/wander across text at differing scales and representations. What do you notice? First, think of how you’ve engaged with the text up until this point – at what distance to you read it? Up close? Across sections? What might become visible in reading the following?
Frequently Quoted Lines and Keywords
Space to Grow
These are only a few ways of reading ecologies based on materials and connections I chose to make visible – pulling them out of the rest. The same data sets I created can be represented differently, or, the data set can be expanded. What is visible is different than reading blocks of alphanumeric texts in a journal format. What can/can’t be noticed is different. Reading Marilyn Cooper’s “The Ecology of Writing” would surely draw attention to certain materials – keywords, citations – but there is something different in allowing these materials to shift themselves from the confounds of the rectangular field. The are able to move between relationships. What can be noticed is ecologically evocative – one reader is likely to notice something differently from another. One word had greater resonance, one relationship makes a line of thinking more available; or, one might notice what isn’t present that we might have taken for granted or forgotten. These should not be the only representations; they should not be let to settle and decompose. Ecologies are dynamic and need relationships across materials to persist. In exploring what ecological methodologies make available conceptually from materials we exist within, comes the necessity to raise as matter of concern that such materials must be rendered visible for conceptual use. In order to construct ecologies as methodology, data is necessary. Much of our data is text based scholarship that exists in an expanse of spaces in paper and digital print. Some is accessible only with membership to institutions or associations, some is open to circulate via Twitter, blogs, and the passing on of tags and links, others are only available through the mail or library reserves. We exist in an abundance of materials, but much of it isn’t available as data, or to construct data for reworking. Data sets from textual corpora can be mined, rendered, and shared for different eyes to look over and notice and collaborate with. Compositions can be more mindful of one another, and the materials assembled to make them possible.
I presented “with” (I say “with” because Joe was actually in Vermont for his summer job and was there in form of a video he created) Joe Torok on slices of our MA projects that share an interest in distant reading, data visualization, the materials of the field, data mining, “the” field, access, connectivity/networks, disciplinarity, and academic activity as graduate students/newcomers to the field. Our session: #h6 “Regionalism, Heterotopoi, and Circumferences: Rethinking Distant Reading” was better attended than I expected, and I’m glad for it because I think this might have been my best presentation to date (no small feat for someone trying to craft an academic identity for themself(s)). While I’m not sure these venues will ever be comfortable for me based on my personality, I can see growth in the structuring of my talk, in timing/interplay/juxtaposition with my slidedeck, with retaining some consciousness in the blackout that always seems to shutdown my brain in presentations, and with discussion/exchanges. I think this might be attributed to me accepting myself and my position in the field: I don’t know what I’m doing, and that’s okay. Curiosity, exploration, and enthusiasm are working in my favor. I hope this doesn’t read as negligence or defiance, because the mindfulness to what I’m doing/not doing is there; it’s just a (silly perhaps) realization that I can’t compare myself to individuals who have been doing this longer than I have – I’m not there (yet).
In attendance (among other wonderful individuals) were Jason Palmeri (!), Doug Eyman (!), and Ben Miller (!) who are also interested in distant reading, data mining, and visualization, as well as Amanda Wall and Gwendolynne Reid (happy to see some other ladies interested in this stuff). Much of our session was left to conversation, which was awesome. I left humming with electricity. An aside: I would really like to record my future presentations so I can sift back through them later. Here is a reduction of the conversation:
- a question about this focus (I’m assuming the size of my data set or my project’s scope) as fractal in relation to what could be done with this methodology
- a question of whether Moretti’s distant reading is the proper term/lens for this work (perhaps because of the set sizes or because of the sliding scale of interaction)
- concerns with the tools available to do this work: cost (money, time, and labor)
- making this work public – making data available for others to work with
- questions of what can be “given away” through these data sets (like the entirety of a journal article as plain text) – to which Doug had an answer I wish I could recall that dealt with the XML file itself, I believe
- Google APIs
- connecting/collective collaboration on this work
- making visualizations that move/interact
There was much more, and this is where I wish I had the aid of an external brain in form of av equipment. For instance, there was a gentleman who is more in industry/software/programming with interest in sentiment analysis who suggested I write Python script for a visualization I ultimately imagined for my data set (which I so eloquently called a “blip map”: the fading/prominence of keywords over time from my set of Braddock essays that might make more visible trends in care/questioning of the field) that I had questions for but not the language to do so. (this feels like a missed connection…)
I’m left thinking:
- I want to learn more about programming/coding
- I want to discover more free tools for building visualizations
- I want to find other people doing this work and find ways to put my work into conversation with the field
- I want these visualizations to be interactive/animated to live up to their use(fulness)
- I want this to keep going
Reading: “Graphs” from Graphs, Maps, Trees: Abstract Models for Literary History by Franco Moretti (New Left Review: 28 (2003): 67-93); Data Mining: A Hybrid Methodology for Complex and Dynamic Research by Susan Lang and Craig Baehr (College Composition and Communication 64:1 (2012):172-194) ; and Grasping Rhetoric and Composition by Its Long Tail: What Graphs Can Tell Us about the Field’s Changing Shape by Derek Mueller ( College Composition and Communication 64:1 (2012): 195-223)
quoting Polish philosopher and historian Krzysztof Pomian: the old historical paradigm “directed the gaze of the historian towards extraordinary events…historians resembled collectors: both gathered only rare and curious objects, disregarding whatever looked banal, everyday, normal” (67)
what would happen if focus was shifted from exceptional texts to the “mass of facts”? (67) – “what a minimal fraction of the literary field we all work on…” Moretti proposes a move from close reading to what he conceptualizes as distant reading.
“a field this large cannot be understood by stitching together separate bits of knowledge about individual cases, because it isn’t a sum of individual cases: it’s a collective system, that should be grasped as such, as a whole” (68) Poses question of “knowing” the field on the basis of its texts – there’s far too many to ever read closely, and even when they are read at a close distance, how can we gain perspective to their relationships within the larger history/discipline?
the “novel is an unreliable commodity” (70) – I believe he is referring to its form as in flux based on its relations to political/social/etc. status (makes me wonder – rhetorical situation (Bitzer) or rhetorical ecology (Rice)?)
“But graphs are not models; they are not simplified versions of a theoretical structure [in the ways maps and (especially) evolutionary trees will be in the next two articles], Quantitative research provides a type of data which is ideally independent of interpretations, I said earlier, and that is of course also its limit: it provides data, not interpretation” (72) – a data handle, a way of beginning to see
multiplicity of time: patterns in event, cycle, longue duree
- event: circumscribed domain of the event and the individual case
- longue duree: very long span of nearly unchanging structures
- cycle: temporary structures within historical flow
- so, all flow and no structure (event), temporary structures for some time (cycle), all structure and no flow (longue duree) (76)
cluster (80) – appearance and disappearance of genres “punctuated by brief bursts of invention” – does he imagine the data graphing/mapping/arranging/assembling in clusters? Are there no outliers? No strange texts? No anomalies?
“What graphs make us see, in other words, are the constraints and the inertia of the literary field – the limits of the imaginable” (82). When we see, we also don’t see – this is potential.
“I began this article by saying that quantitative data are useful because they are independent of interpretation; then, that they are interesting because they demand an interpretation; and now, most radically, we see them challenge existing interpretations, and ask for a theory, not so much of ‘the’ novel, but a whole family of novelistic forms. A theory-of diversity” (91). Does Moretti view these texts as heterogeneous? Heterogeneous as a collective whole vs. homogenous as a whole? Or does he see more grouping/categorization through these graphs?
Does Moretti see the novel in flux due to rhetorical situations – “an uncertain relation to politics and social movements” (73)? – “The causal mechanism must thus be external to the genres, and common to all: like a sudden, total change of their ecosystem. Which is to say: a change of their audience. Books survive if they are read and disappear if they aren’t: and when an entire generic system vanishes at once, the likeliest explanation is that its readers vanished at once” (82). What is the difference look like between a rhetorical situation and a rhetorical ecology? Do graphs fit? Or are they not distributed enough? Are graphs too singular from the network? Do they need to be connected? Are they a step toward composing networks? And what of metanoia? kairos? chronos? in these graphs, rhetorical situations, rhetorical ecologies, and actor-networks?
Lang and Baehr
opens with quote from John Naisbitt “We are drowning in information but starved for knowledge” – taking stock/inventorying our field
as a field, [composition] “we’ve often relied on lore, anecdotal evidence, or studies relying on small sample sizes to defend our assertions. Data mining won’t provide an instant or simple answer, since we still need to determine what data we have available, examine the data to see if any trends emerge, and then, most likely, ask more questions and turn again to the data, which offer us a new set of tools and strategies for research” (174) – calling for data mining to justify what composition does, and consequently, doesn’t do
compositionists “have rejected quantification and any attempts to reach Truth about our business by scientific means, just as we long ago rejected ‘truth’ as derivable by deduction from unquestioned first principles. . . .” (174) – we won’t reach “truth” by critique, but we won’t reach “knowledge” (of) by neglecting collecting/counting/sifting materials that constitute our field
working from Richard Haswell’s article, “NCTE/CCCC’s Recent War on Scholarship,” in which he “tracks the publishing trends of RAD (replicable, aggregable, and data supported) research in flagship journals. Haswell defines such research as that which may or may not employ statistics, but is “explicitly enough systematicized in sampling, execution, and analysis to be replicated; exactly enough circumscribed to be extended; and factually enough supported to be verified” (174) – an attempt at bringing “hard” data into the “soft” humanities? (we’re not viewed as a science, not even a social science…). RAD data-driven inquiries (174):
- Data results from a set procedure of observation, elicitation, and analysis – illustration that it doesn’t lose its human-centric tendencies as illustrated by our common methodology of collecting data – ethnography?
- Description of a system of text analysis or a research method or a research tool, application, and report of results
- Establishment of a descriptive or validation system and then application to text, course, or program
- Textual analysis with report of application, using a systematic scheme of analysis that others can apply to different texts and directly compare
quoting Chris Anson from his piece “The Intelligent Design of Writing Programs: Reliance on Belief or a Future of Evidence”, “Ultimately, changing the public discourse about writing from belief to evidence, from felt sense to investigation and inquiry, may help to move us all beyond a culture of ‘unrelenting contention’ (Tannen) and toward some common understandings based on what we can know, with some level of certainty, about what we do” (175). – the point is to be able to discuss what WPA and WP do with the public, motivation to turn to data-driven methodology/inquiry
“Data mining is loosely defined as the process of finding interesting information in large amounts of data” (176) – not “numbers” out of context
“It can also help us conduct research of a more exploratory nature, providing windows into the data that we can use to determine what questions to ask of that data” (176) – working against the bias that numerical data comes from a pre-determined hypothesis – or what is expected to be found
“Data and text mining, then, can be exploratory and, consequently, more descriptive, or they can serve a predictive function” (177) – working to explore, or working to illustrate (these aren’t mutually exclusive either)
“knowledge to be gained is implicit in the data. Data mining might be predictive, in that it seeks to forecast future actions or behaviors through examining patterns in the data, or descriptive, in that it attempts to explain those patterns and the implications thereof. It can be used to classify information or cluster it into groups according to similar characteristics and represent that information in more concise ways. Data mining can also be used to detect anomalies or outliers in a data set” (177-178) – separates it from other statistical methods
another mention of “cluster”, as in Moretti, – “Clustering involves the grouping together of similar data items; unlike classification, the labels of the clusters are not preset” (178) – what do these clusters look like? How do they surface in the data?
“Associations and patterns further assist with the understanding of data. Associations refer to relationships among data items that might predict behavior of a user group” (179) – determined from the clusters? That would be a logical progression – clusters to associations to patterns
“Data mining can also be inductive, unlike data analysis that often begins with a hypothesis that is to be proven or disproven by examining the data. Data mining allows for the fact that the relations between factors that will tell the user that the most interesting, nontrivial information may come from variables that do not initially seem to have any distinct relationship” (179) – what happens in these associations that we don’t otherwise see (?)
RAD based study types according to Chris Anson: foundational research and syntheses, replications and extensions, graduate research, connections with the general public, increased scrutiny and critique, and improved research communities (180) – the first two apply to data mining, while the other four, with some work, can as well according to Lang and Baehr (180-183)
toward foundational research and syntheses and replications and extensions, data mining makes available:
- revisiting these foundational works can help validate their findings or uncover how these accepted or ”given” approaches have changed over time
- the ability to enable researchers to examine the ever-increasing number of studies published and posted online and build connections and syntheses from those that can expand or contract in scope as most appropriate to answer a particular query
- In addition to including larger sample sizes, artifacts could be examined through a variety of different lenses that might shed additional light on the core research questions of the study,
- Follow-up studies
The remaining four, graduate research, connections with the general public, increased scrutiny and critique, and improved research communities, Lang and Baehr connect them to the public by examining “key questions or issues of popular interest or sustained issues over a period of time” (183). They quote from Kelly Ritter’s article “Extra-Institutional Agency and the Public Value of the WPA” to point out that:
First-year composition is a public enterprise historically. It’s no secret to WPAs that their necessary public defense of student writing—and the myths that require such defenses to be launched—are a result of this perceived communal ownership. Composition, unlike other academic disciplines, is perpetually at the mercy of cultural conceptions of literacy, whether through various levels of community “sponsorship” and that sponsorship’s accompanying costs (Brandt, “Sponsors of Literacy”) or complicit institutional structures, fueled by culturally skewed no- tions of “correctness” in discourse, all of which keep composition at the bottom of the academic hierarchy (Crowley) (183) – the exigence for their data mining. If we can show processes, theories, habits, ideas, practices in a more “concrete” (as in what the sciences show/do), then we can better account and demonstrate (prove?) what we do in composition studies. Or, what we have done, have done poorly, have done better, and what we have yet to do.
They call for a “stronger culture of research”by using the Graduate Research Network (online) as an example: “the Graduate Research Network (GRN) has thrived as a part of the Computers and Writing Conference since 2000. However, its listserv (founded February 2007) and its blog (founded February 2011) have met with less success. The listserv currently has thirty-five members and has circulated forty messages since its founding, and the blog has a single entry” (184) – yikes! Part of data mining, finding/connecting, is not only looking for these spaces, but looking at how they are living on the web. It is not enough that they exist.
“The process of data mining, an often iterative one, involves identifying problems, data sources, and heuristics; establishing a formal procedure; and interpreting results” (185). The process, in detail, is outlined below:
Selecting information resources…may be based on a number of factors, such as access, relevance, availability, or possibly the nature of inquiry. Once problems and sources are selected, heuristics, or what measures or criteria should be used to systematically probe data, must be decided. This may involve developing categories (clustering), classification (or classes), variables, or other methods that will be used to sort data. When these methods have been selected, developing a formal, systematic procedure, a repeatable process, should be used to sift through the data. A process involves a number of logistical details such as data collection, reduction, formatting, and storage. In developing a documented process, the researcher can help ensure replicability, so in future studies, methods used can be transferred elsewhere… Finally, addressing results, interpreting data, and identifying trends and conclusions are the last part… (185) – what data mining “looks” like (perhaps this is just making a case for the methodology? One table that came from their study to explore the cause of “seventeen sections out of the sixty-four sections offered by our first-year writing program that had a failure rate of 30 percent or more (defined as students earning a D or F in the course or withdrawing from the course) Access to sources of data captures my attention, particularly because if this isn’t a widely adopted practice, and if its goal is to find/locate/collect data, where does one start? This is where my interest in Latour’s networks comes in, but perhaps the process of assembling networks can happen at the same time as data mining, meaning that there isn’t necessarily delineated steps in a chicken or the egg type scenario. Can these explorations/discoveries happen simultaneously?
“as a methodology, data mining typically requires longitudinal data collection in order to ensure stronger validity in findings. Without an existing source of longitudinal data, a significant time investment may be required to collect data, to front-load the project” (191) – Bold added by me. This ties into the previous long quotation about the process of data mining. It is my hunch that the data is assembled based on inquiry/exploration. A holy grail of longitudinal data doesn’t seem likely/possible/desirable. I still question what network looks like through data mining and visualizing – something like CUNY’s Writing Studies Tree? Where does this data go? How can it be found? And by who?
“Composition studies, not unlike other humanities research, must continue to re-examine and re-evaluate foundational studies and findings, as part of its evolution and body of knowledge” (191) – what is composed can decompose, but it still must be accounted for for future compositions.
Gives a short list of data mining software currently available (but how many are free? This seems to impact who is/can do this work…) in a handy appendix – my annotations added as a curious/interested graduate student:
- Any Count: Character, Word, and Line Count software: http://www.anycount .com/ – a purchase of 49 euros. Produces “automatic word counts, character counts, line counts, and page counts for all common file formats”.
- Clarabridge Text Analytics: http://www.clarabridge.com/ – advertises “free trial” which then leads to a purchase. Does “High-fidelity NLP with Semantic Analysis, Advanced Classification, Advanced Sentiment Analysis, Sentiment Tuning & Scoring, Standard Reports & Visualizations, Automatic Structured Data Linking”.
- Crawdad Text Analysis Software: http://www.crawdadtech.com/ – $95. Works as a “generator, visualizer, browser, finder, comparator, classifier”.
- Eaagle Full Text Mapper: http://wp.eaagle.com/?page_id=16 – “Plans and Pricing” lead to “404 Error : File not found”. Does “Relevant words and topics identification through mapping, Weak or emerging signals identification, 3D Mapping, Reporting”.
- MorphAdorner: http://morphadorner.northwestern.edu/ – could it be? Free. Works as an “annotation” and “tagging” tool.
- NVivo Research Software : http://www.qsrinternational.com/products_nvivo .aspx – full license $670.00; student license $215.00. Works as a “coding” tool.
- Predictive Data and Text Mining: http://www.data-miner.com/ – Must purchase book. Amazon listing $57.81. Does “(a) data preparation including tokenization, stemming, vectorization, and dictionary compilation (b) prediction by methods such as naive Bayes and advanced linear models (c) information retrieval by k-nearest neighbors and document matching (d) document clustering and (e) information extraction of named entities.”
- SAS Data- and Text-Mining Software: http://www.sas.com/ – Call for price. Performs “predictive and descriptive modeling, data mining, text analytics, forecasting, optimization, simulation, experimental design and more”
- SPSS Data Miner: http://www-01.ibm.com/software/analytics/spss/products/ data-collection/ – Page requested cannot be displayed.
- Tableau Visualization Software: http://www.tableausoftware.com/ – Personal edition at $999.00. Creates “data visualizations” that work from spreadsheet like forms. “As powerful as a freight train. As user-friendly as a kitten.” “Create maps, bar and line charts, heatmaps, dashboards and more”
- VantagePoint: http://www.thevantagepoint.com/ – “VantagePoint helps you rapidly understand and navigate through large search results, giving you a better perspective—a better vantage point—on your information. The perspective provided by VantagePoint enables you to quickly find WHO, WHAT, WHEN and WHERE, enabling you to clarify relationships and find critical patterns—turning your information into knowledge.” Differentiates between literature and scientific literature. Need an account.
Using the concept of distant reading from Franco Moretti, looking at the work done by Donna Burns Phillips, Ruth Greenberg, and Sharon Gibson in “College Composition and Communication: Chronicling a Discipline’s Genesis” (1993), and the snapshot Janice Lauer produced in her Rhetoric Review essay “Composition Studies: Dappled Discipline”(1984), Derek works to update and further the ongoing inventorying of rhetoric and composition. From the article’s abstract, his exploration “In its focus on graphing, the article demonstrates an application of distant reading methods to present patterns not only reflective of the most commonly cited figures in CCC over the past twenty-five years, but also attendant to a steady increase in the breadth of infrequently cited figures” (195) – he is looking at citation frequencies in CCC to offer perspective of the changing disciplinary density (197). As a graduate student/newcomer to the disicpline, I take particular interest in ways of exploring the field in more focused ways than searching databases (if there is one) for keywords (only those familiar to me are available) or author (only those familiar to me are available). There is the process of referral by peers and professors, as well as reading about what others are reading on scholar blogs I follow (only those “of interest” to me – only those familiar to me). This usually amounts to many scraps of paper with scribbled names, open browser tabs that never get their due diligence, or IOUs in my Evernote mounting “things to read” notebook. Spending time with these pieces is difficult, if not unlikely. And these are only the ones I know about. Where are the others? Who are the others? How do I find (see) them?
using graphing/quantitative methods to “read” journals and the surrounding fields alters the scale to allow us to see aggregate patterns that link details and non-obvious phenomena and compiles replicable data that can represent impressions of changing conditions in the discipline of composition (196) – the field is in flux. its history not easily traced. what represents it? is it oil paintings or marble statues of “the greats”? are these greats established by adoption/circulation of ideas? does notice come to the offbeat? the counter? the on the fringe? Becoming an active participant in the field requires me to contribute, but how can I compose if I do not know what has been (de)composed?
reading tends to be local, a direct encounter with a focal text – words, sentences, paragraphs, but reading across, from a distance, allows us to zoom out, to see patterns that are unrecognizable when we’re reading at the level of the article (198) – a means of orientation, a handle. or an oar for orienteering.
as the field continues to develop/mature, there is a growing demand for theoretically sophisticated scholarship. From where (what materials) does this composing draw from? The past? Does it trouble? Refute? Reimagine? Does it stray into other disciplines to borrow? Keeping track of these sources becomes critical. While I don’t fully understand the “complicated politics of citation”, I imagine it has to do with who is/isn’t given credit and for what. Creating something “new” only to realize it’s been composed/proposed before. Or, not realizing. And the process of composing, assembling materials, draws from the materials of composition – its texts. Graphs allow us to grasp non-obvious trends (200) – and I think non-obvious is flexible, perhaps, to the individual. Some theories will be more easily aligned/identifiable with established ideas in the field than others. And within/without those, juxtapositions, borrowings, and revisioning from the well worn and off the beaten paths.
heuretic discipliniography (Derek Mueller) “writing and rewriting the field by exploring the intersections across different scholars’ bodies of work as well as the associated pedagogical, theoretical, and methodological approaches they mobilize” (201). – This is a rich term, borrowing from Gregory Ulmer’s heuretic – the use of theory for the invention of new texts, -ography – a field of study that brings to mind geography, cartography, tomography, and so on, and discipline – as activity. It is action (re)search.
In graphing the works cited entries from the data set, the articles/works cited/name references published in CCC, Derek raises poignant questions: “What is at stake in knowing or not knowing any of the figures shown here? What presences and absences are most striking? To what degree are new scholars…overshadowed by well-established ones?” (201) – It seems odd to me, at this point, to realize that heavy hitters shape the field, but do not necessarily “make it” entirely. My own interests should make me aware of this, but I liken it to the field of “scientific knowledge” that most common people know of. There are only a handful of scientists/theories that I can explain/know of, but these are not the entirety of science. There are many in betweens, smaller composites that make a larger composition/contribution, there is borrowing, reshaping/recomposing, and the potential for something like dormancy in an idea’s circulation until a “eureka!” – “So while quantitative studies of authors cited in a well-known journal may offer a reasonable indication of the “common knowledge” of the field, this approach must not appear to produce a definitive roster of influences on the discipline.” (206)
Derek illuminates the (changing) practice of citations. He explains a conventional list creates a reduced record that affirms the presence of a source but makes no distinction/explanation of how a sources was used and from what parts it was used (location in a text). This downplays the scope of the reference/author – production, reception, and circulation (205). This is flat. Nothing to see here. Until we gain a little distance (perspective).
Looking at the top hits of composition studies through the graph, these figures do not tell us what was happening across the entire sample of the names in the citations. The names are indicative of currents in conversation in the field (206) – but who was left out of the Burkean Parlor? (214) The room is full, loud, standing room only. And some invitations got lost in the mail.
Quoting Chris Anderson in The Long Tail “We have been trained, in other words, to see the world through a hit-colored lens” (207)
Anderson’s long tail is an inquiry into niche music interests in online markets (vs. store-shelf retail giants) – the long tail is thin and demonstrates the deviation from the head of the graph, or the high ranking hits commonly available on shelves in stores, to less popular albums/artists selling online (208.) Applied to composition scholarship, the long tail provides an “abstract visual model” to potentially illuminate new insights, raise new questions, and explore the continuing maturation of the field. It allows us to engage with large scale data (209).
Derek looks at unique names of well known figures like Maya Angelou, Bill Gates, etc. invoked just once in the data set of name references. The figures at the top tell us something about citation practices and the scholarly conversation, but the long tail of unduplicated references allows us to begin to assess how broad based the conversations in the field have grown (211) and raises the question “How flat can the citation distribution become before it is no longer plausible to speak of a discipline?” (215). Being too fixed in the head of the graph, the frequently cited/evoked, has implications just as an absent-minded tail of strings of specialized references has – we have to mind where we stand, head and tail.
“Graphing provides a partial readout of the field’s pulse with respect to compactness and diffuseness, which complicates speculation about where the field stands at any given moment and where it is headed” (217) – what has changed over time in the relation between the head and the long tail? (215)
vantage point – “disciplinarity in general”: depending on one’s vantage point, the head or the tail, the field can appear highly focused or as a loose amalgamation. Recognized and shared principles vs. pocketed enclaves of unique interest that ignore disciplinarity. Both vantage points, generalist and specialist, are involved in seeing/shaping. Niche enclaves (specialized) negotiate a shared disciplinary frame, they contribute to the field’s shape. Graphing can help us better understand the ways specializations negotiate and cohabit (inter)(intra)disicplinary scenes (218-219). Bolded words added. Now what do we see?