page tectonics

small thing frequencies

Skip to content
  • Home
  • Bio
Search

metadata

metadata mediation

February 16, 2014June 26, 2015 / Jana Rosinski / 1 Comment

In RCDH this week we’re discussing metadata by way of a collection of interesting readings. Jessica Reyman’s “User Data on the Social Web: Authorship, Agency, and Appropriation” (and texts she has shared via Twitter @jessreyman) primarily occupied my interest. I feel like I know enough about metadata to nod along in conversations where it surfaces, to understand that what it can index about data created on the web is tremendous (see the cross space ads encountered that track patterns of search and purchase or these Twitter metadata visualizations), that it is aggregated by large (unseen or hiding in plain sight) entities, and that the line between it and what I “create” on the web is grey. Reyman gets at this grey area by exploring the complexities of using social web technologies, recognizing all of the productive activities that occur—not only the creation of content, but the overlooked generation of data (515). Reyman poses the following:

  • How do we write on the social Web?
  • How do we write ourselves through the social Web?
  • And how do we write the social Web? (516)

It is Reyman’s argument that data is not merely the by-product of algorithms and aggregation formulas, but “a dynamic, discursive narrative about the paths we have taken as users, the technologies we have used, how we have composed in such spaces, and with whom we have participated” (516). Reyman provides examples of different social web platforms policy language on the distinction (or not so clearly delineated) between data the sites collect (which is framed through a rhetoric of enhancing user experience) and the content users produce. Reyman proceeded to blow my mind with a definition of data that I didn’t know: data and content/information are not viewed as the same thing (519). The data generated through content production is viewed as an “authorless object” (524), or one authored by technologies (nonhumans) alone. User consent requires only an agreement to the services when signing up, not an active role in understanding or mediating responsibilities of use (521). What Reyman argues is that data should be considered as a text coauthored with technologies, texts, and other users, which would reconceptualize it as living instead of as a sort of waste by-product of the content (523). Citing Krista Kennedy’s “Textual Machinery: Authorial Agency and Bot-Written Texts in Wikipedia”, Reyman makes visible the interaction between humans and nonhumans in data production “user data generation depends on users, on their interactions, participation, and production. It does not exist without them” (527).
Reyman tulimately draws our attention to the need to balance technology companies rights’ over user data (that they claim for the free services provided) with rights’ of the users, explaining that “With a balancing of users’ and technology companies’ rights over user data, the social and participatory Web could be nourished as a space that provides access to tools for participation and production, and also recognizes the value of human agency required for rich, meaningful social networks” (529).

In thinking about how we write ourselves and the social web through the social web, I’m wondering what means of balancing data are available beyond privacy fences (DoNotTrackMe and Ghostery) that can be built around browsing. I find myself thinking about the work the Creative Commons is doing to help users understand creating and interacting with web content; does data fall within Creative Commons domain? Is it an issue of outdated copyright law policies (seeing data as by-product rather than content) in need growing (more dynamically) with the times? I’m also curious to learn about more projects/texts created from data that could make visible what is too often dismissed as content debris—developing a meta awareness of metadata in interacting with the web.

archives: designing by/for the curious eye

February 9, 2014June 26, 2015 / Jana Rosinski / Leave a comment

I was drawn to dwelling in Michael Neal, Katherine Bridgman, and Stephen J. McElroy’s “Making Meaning at the Intersections: Developing a Digital Archive for Multimodal Research” because one year ago I attended their panel “Developing a Digital Archive for Research in and beyond the University” at the Networked Humanities conference at the University of Kentucky. My initial preoccupation was turning to my memory of the conference (which is laden with blackout moments due to being nervous in such awesome company of people and ideas), and then my memory archived in my notebook of that time (this one suffered heavy water damage due to a poorly designed water bottle flooding my backpack), and then to pieces of the conference in web archives (hashtag was #nhuk) of tweets (on Storify) (TAGSExplorer) (Google spreadsheet), a handful of pictures, and blog posts that recapped presentations and reflected. Why I mention this process of scouring the web for traces of the conference is because my experience (my attention, my presence, my capacity) is limited. My memory of attending this conference, as deeply interested as I was, is already mostly fuzzy, although only a year later. See my (bad) notes below on the panel:

nhuknotesWhat I can recall is the panelists making a distinction between the good eye and the curious eye in looking, but the rest provided me little to reconstruct the presentation from until I could brush my own recollection (ideas, experience) against that of others. While I hesitate to call the stored Twitter exchanges an archive, its function during events/exchanges (like conferences) brought to mind questions of non-static archives, what it might mean for an archive to be fluid and responsive to user inquiry—How can archives be dynamic in design, almost heuristic like based on the user’s interests? How can such an archive function and be maintained?

In their Kairos piece, the authors detail the development of their postcard archive at Florida State University both digitally and physically with the specific goal of making their work visible—that is, how the archive is constructed (and in the process of construction) is not hidden behind the scenes of the collection. As archivist-researchers, they ask of their work:

  • How will researchers visiting this archive make meaning from within the infrastructure that we have constructed?
  • How can we continue to curate a digital space designed to enable growth over time at the same time that it hopes to reflect the voices and perspectives that are engaging the archive from both within and outside of the academy?

They explain that their approach to the archive’s infrastructure and metadata is resultant from visibility, or being able to see—the postcards, their shaping of the postcards, and the interactions that users have with the postcards:

“Our approach to archiving these material artifacts of everyday writing is very much a process of seeing, of learning to see, the multiple hands and voices that shaped and continue to shape these artifacts, even as they exist digitally in this open-source, expandable archive”.

This notion of seeing, of visibility, is of particular significance to the use of the archive as responsive to user contributions (through permitted registration with Omenka, where the archive is stored) and projects that users bring to the archive. The archivist-researchers explain at length how the archive is organized, a classification scheme that permits searching by date, place, subject matter, genre, publisher, etc. to account for the ecological nature of each postcard text. This design is to emphasize the potential of serendipity, or the unexpected, by carefully coded metadata that can permit such exploration and function heuristically for potential use (metadata categories that respond to the interaction between the artifact and the archivist). Designed this way, the archive is working to be flexible enough for research projects—that is more than one and undetermined from the outset:

“Each of the metadata fields we use to describe the artifacts (or a combination thereof) is a standard for arrangement, a potential filter. Insofar as a given digital representation of an artifact is composed by an arrangement of metadata relationships, the archive and its interface contribute to the modality and materiality of that artifact.”

This is where I wonder about their distinction between a good eye and a curious eye. The artifacts must be encoded with metadata to be searchable/accessible at all, and the authors explain this process, which evolved over time in interacting with the artifacts, but these negotiations were not just focused on interpreting the postcards as static artifacts to be classified within a static taxonomy, but on interpretations:

“As we negotiate the infinite levels of address possible for each postcard in our archive, we can attempt to preserve current and future readings of the card so that those readings will not be lost in history. In doing so, our job as archivists becomes one not only of cataloging, preserving, and making accessible artifacts, but also of cataloging, preserving, and making accessible interpretations of those artifacts, the multiplicity of meanings gleaned from and attributable to our archive.”

In returning to my exploration of the tweet “archive” of #nhuk and the design of this postcard archive, I wonder if the curious eye is a new standard in archive infrastructure, that an item is not just to be found because of the careful design of the good eye, but is to be found +. As if determining classification schemes based on categories and keywords didn’t seem difficult enough, how might they be designed for interpretation?

Databases

February 2, 2014June 26, 2015 / Jana Rosinski / Leave a comment

This week, we are exploring databases in RCDH. This is an area of great interest to me to wonder/wander about, but to situate myself, I focused on reading on “The Notion of the Textbase: Design and Use of Textbases in the Humanities” by Charles Conney, Glenn Roe and Mark Olsen. I thought this reading would help me understand more about the role of metadata in textbase design and its effect on differing searchability. When I say I’m interested in databases, my knowledge of them is limited. I understand, at a very surface level, the word searching feature of databases working somehow with an algorithm that calls upon available text metadata through bibliographic and keyword information. I don’t understand the complexity of the algorithm at work, nor can I see “behind the scenes” of the database to know what text collection it is calling from and based on what criterion. My interest in databases is in better understanding these invisible operations and features, so my searching isn’t limited to

This slideshow requires JavaScript.

and wondering how the tags are created, and what information from the text is being utilized for researching. Additionally, I had questions if there existed algorithms or schemes that were more exploratory in research; that is, ones that don’t seem to necessitate having search results or knowledge available before research begins.

To begin, textbases are a coherent collection of partly structured or unstructured digital documents that are assembled around unifying principles as thematic or generically similar. Cooney et al. define their goal as defining a selection of humanities databases built for textual analysis with particular attention to design principles that underlie their structure and inform their use. They argue that as textbases continue to develop, tools can offer alternatives beyond word searching which might allow the discovery of unseen connections across/between texts, the capability to trace ideas over a large collection or period, and to identify contextual and intertextual relationships of an individual text to any number of other texts. They also describe that textbases, utilized by digital humanities and electronic publishing, are hybrids of publishing and scholarly models that take up the concerns of presentation of text and the capability to support computer search through encoding. This design comes in three varieties: heavily encoded internal notation (works to preserve texts as historical documents), reliance on relational databases to manage metadata (works to understand texts and their creation and dissemination), and those of minimal level of markup (examines word use over large collections). The markup, or metadata, provides leverage or a handle for refining searches and permitting more complex retrieval tasks beyond a word search; so, that of author, title, date, genre, geographic area, historical context, etc. The amount of markup, in my understanding, can vary on a spectrum of retrieval to representation; this variance is due to the use of the texts – that of electronic publication or digital humanities research projects. Textbases created for textual analysis purposes are constructed to be used with computer assistance—the needs of “mundane” humanities research that looks across information at a massive scale quickly have les metadata encoded. Mixed mode textbases focus on word based data retrieval; indexing metadata is externally handled and associated with complex retrieval tasks to locate texts. These both operate on the automatic discovery of textual patterns based on word centered search, which is becoming increasingly limiting as text digitization and texbases grow – it reaches a threshold in navigability.  Conney et al. describe developing heuristic tools – machine learning, text mining, similarity algorithms, and clustering – that take form of one of three machine learning techniques: predictive, comparative, and clustering or similarity. Predictive and comparative techniques are systems designed to identify patterns or features associated with predetermined classes to distinguish categories in texts. They provide the example, though limited in its binary, of the algorithm that sorts spam email from non-spam email as distinguishing categories in a collection of texts. Clustering and similarity techniques work to identify documents or parts of documents that are most similar to one another rather than beginning from predetermined classes. They provide the example of this algorithm that identifies broad topics like Amazon recommendations. These are sensitive to smaller classes and functions with less orderly schemes, which are described as more “suited to the human” at the scale of a more defined collection. Conney et al. go into specific examples of intertextuality and similarity at work, which remain over my head for the time being in terms of accounting for the understanding of vector space models and n-gram word sequences, but end with the burden textbases will require attention to as the continue to grow—specialized language, specific encoding, and dedicated technical support to their creation and maintenance. They state that textbases will continue to improve to become better suited for humanities research in terms of sufficient tools, more narrow collections, and more focused research capabilities within text collections across entire networks.

While I understand better some of the language needed to talk about textbases and more intricate details about their scales of function, I’m still left wondering:

? if there are more textbase designs then these emphasized
? how the collections are formed and access to networks collections have
? how the exploratory and heuristic researching functions
? the relationship between metadata and patterns through textbase research
? if there are available textbases to explore (open source)
? are these at work in our scholarly publication and library databases

seek

reverberation

My Tweets

assemblages

  • conference
  • coursework
  • craftwork/artwork
  • wunderkabinett of curiosities

frequency

(un)available design academia action affect agency ancient rhetorics ANT archive aristotle art body Bruno Latour CCCC comparative rhetoric composition conference connecting connections connectivity coursework craft craftsman craftsmanship data data visualization definition design DH digital humanities disciplinarity distant reading fovea hand historiography history human Ian Bogost identity knowledge lists machine maker making mapping material materiality material rhetoric matter meaning memory metadata metaphor mother networks nonhuman objects ontology OOO perspective philosophy photograph photography play process production prose ptsd punctum rcdh reflection research methods rhetoric rhetorical carpentry richard sennett Roland Barthes screen print screen printing seeing self(s) self-inventory sensation shift skill small philosophy space studium techne technical technique technology text theoria time timeline to-do list trace of work visualization visual rhetoric WIDE-EMU writing
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Blog at WordPress.com.
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
  • Follow Following
    • page tectonics
    • Join 28 other followers
    • Already have a WordPress.com account? Log in now.
    • page tectonics
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...