Data Management for the Humanities

What Is Data Representation?

By data representation is meant, in general, any convention for the arrangement of things in the physical world in such a way as to enable information to be encoded and later decoded by suitable automatic systems.

We specify conventions because information can be conveyed by other means as well. A dog may know by sniffing the air who has passed a given way in the preceding hour or so, but does not in doing so rely on any agreed conventions for information transfer. This and similar exchanges of information do not involve data representation in the sense we mean it here.

We specify automatic systems to distinguish data representation from the more general topic of the representation or encoding of information, which includes conventional writing systems, paper drawings, and other representations of information used by human beings.

We do not specify what physical objects are to be arranged, or how, or what kind of information they are to be used to encode, because data representation might in theory involve any kind of physical object and any kind of information. In practice both the physical objects involved and the conventions for their arrangement have varied a good deal over the short history of automatic information processing by means of machines. Holes punched in stiff cards, magnetic charges on a thin coating applied to plastic tape or flat metal disks, holes in paper tape, variations in the optical properties of the surface of a thin plastic disk, dials controlling electrical circuits, the positions and lengths of cables, and the positions of spools and sticks in a Tinkertoy construction have all been used successfully to represent data.

A particular convention for data representation is often referred to as a data format.

An understanding of principles and issues of data representation is essential for data curators because only curators who understand how information is represented by the digital objects in their care can take effective steps to ensure that the information represented is not lost. The long-term sustainability of digital objects is materially affected by the methods of data representation relied on by those objects; tradeoffs between different courses of curatorial action can be correctly assessed only with an understanding of how the information to be preserved is represented by physical objects.

Adapted from:

“Data Representation” 
C.M. Sperberg-McQueen, Black Mesa Technology
David Dubin, University of Illinois, Urbana-Champaign