One way to represent information is to create and sustain a direct analogy between salient properties of the system being modeled (the information) and the physical representation of the information. When the physical representation is astutely chosen, operations on the physical representation can correspond to operations on the objects being represented and the representation can be used for (for example) calculation.

Real numbers, for example, can be represented in an intuitive way using lengths of string or wood. Numbers with similar values will have physically similar representations, and the addition of a set of numbers can be represented by placing the representations of the numbers end to end; the sum is represented by a string or piece of wood whose length just equals the distance from one end of the sequence of addends to the other. The addition of numbers using a slide rule is based on precisely such a representation. (A more common use of a slide rule, of course, is to multiply numbers; in this case, each number is represented by a length of rule proportional to the logarithm of the number, and multiplication is represented as the addition, using the convention just described, of the logs of the multiplicands.)

Because it is based on an analogy of properties between the representation and the represented, this form of information representation is called analog. A fundamental property of an analog representation of information is that representations with similar physical properties represent similar things: infinitesimally small differences in the state of a representation correspond to infinitesimally small differences in the information represented, and there are uncountably many different meaningful states. It is a consequence of this property that small errors in the representation will result in small (though often tolerable) errors in the results of a calculation.

Among the best-known uses of analog representations for complex information are the tide-predicting machines developed in the 19th century and in use in some locations until the 1970s, which predicted tides using ingenious systems of cables, wheels, and pulleys.

A different family of representations uses a purely arbitrary or symbolic relation between an object and its representation; the physical representation serves to record a symbolic expression or notation of some sort, and the expression has an arbitrary relation, defined by convention, to the information it represents. The arbitrariness of the relation will be familiar to some readers from many other discussions of signs and signifiers.

The data representations used in modern computer systems all fall into this family. Their fundamental property is they represent information indirectly: physical phenomena are used to represent sequences of binary digits (zero or one), and sequences of binary digits are then interpreted as integers, real numbers, characters, or other “primitive” data types. From the use of binary digits as a fundamental building block (and more generally from the similarity of these representations to the use of fingers as symbolic units in counting), these representations are termed digital.

The fundamental property of digital representations is that they are based on the use of a finite number of discrete symbols to represent information. Because finite systems can represent only a finite number of symbols, in any such system there is only a finite number of possible meaningful states; this is a fundamental difference between digital and analog representations of information. (In practice, the number of states distinguishable in a digital system is large enough that it often simplifies reasoning to pretend that it is infinite.) In digital systems, the physical similarity of two representations of information is no guide to the similarity of the information they represent. (For example, the bit sequences 0000 0000 and 1000 0000 differ only in the value of a single bit, but if they are taken as unsigned integers, they denote 0 and 128; many numbers much closer to zero than 128 have representations very different from either.) Small errors occurring in the physical representation of information (e.g. the accidental flipping of a single bit) can and often do lead to wildly erratic results.

The early years of electronic computing machinery were marked by competition between digital and analog (or more frequently digital/analog hybrid) computers, but eventually digital devices swept the analog and hybrid devices from the marketplace so thoroughly that early descriptions of electronic binary digital stored-program computing machines now seem quaintly dated, and the representation of pre-existing materials in machine-processable form is referred to as digitization, as if no other form of machine processing were conceivable. Those early descriptions now serve as reminders that not all computational devices need be digital, or binary, or electronic. A key element in the commercial victory of digital devices was the development of methods for simulating the behavior of analog devices using digital representations. So even in environments which are strictly speaking digital it is sometimes useful to distinguish methods of representation which are more purely digital from those which have, or seek to have, analog properties. In the context of contemporary digital machines, therefore, the term analog may be applied to representations of a thing which model selected physical properties of that thing as closely as possible, typically using (digital representations of) real numbers; in contrast a digital representation represents the thing in symbolic form, typically using symbols from a (relatively) small number of discrete, enumerable atomic symbols.

A text, for example, can be represented by a scanned image of a page on which the text has been written, in which the data format records information about the hue and brightness of the light reflected from different points on the paper; this is an analog representation of the text (or more precisely, of the page), in the sense just described. The text (and the page) can also, however, be represented by a sequence of characters, each character represented internally by a distinct pattern of bits, with no attempt to record the physical appearance of the paper or the writing on it, only to record the identities of the symbols used to encode the text. In the sense just given, this is a digital representation of the text.

As illustrated by this example, analog representations often mimic physical attributes without any distinction between those which carry meaning and those which do not, while digital representations require an understanding of the properties of the thing being represented. For this reason, analog representations are sometimes associated with the act of perception and digital representations with the act of cognition (e.g. by Devlin 1991). Typically, digital representations provide better access to information of interest than do analog representations. Full-text search of digital representations is a straightforward operation, while full-text search of images representing pages of text is possibly only via a detour through a textual representation (often created automatically by optical character recognition, with the high error rates entailed by that operation). Because purely digital representations can omit extraneous information, they tend to be more compact than analog representations. An image of a page typically requires many times the storage space needed for a character-based transcription of the page. Conversely, because analog representations do not discriminate between relevant and extraneous information, they will typically convey information omitted from a purely digital representation of the same thing. Sometimes this extra information will prove useful or important.

Adapted from:

“Data Representation”

C.M. Sperberg-McQueen, Black Mesa Technology

David Dubin, University of Illinois, Urbana-Champaign