Some of the earliest known examples of recorded information come from Mesopotamia, which roughly corresponds to modern-day Iraq, and date from around the middle of the fourth millenium BC. The writing is called cuneiform, which refers to the fact that marks were made in wet clay with a wedge-shaped stylus.
A particularly famous mathematical example of cuneiform is the clay tablet known as YBC 7289.
This tablet is inscribed with a set of numbers using the Babylonian
sexagesimal (base-60) system. In this system, an angled symbol, <,
represents the value 10 and a vertical symbol, |,
represents the value 1.
For example, the value 30 is written (roughly) like this: <<<
.
This value can be seen along the top-left edge of YBC 7289
(see Figure 5.1).
|
The markings across the center of YBC 7289
consist of four digits: |, <<||||
,
<<<<<|
, and <.
Historians have suggested that these markings represent
an estimate of the length of the diagonal of a unit square,
which has a true value of
(to eight decimal places).
The decimal interpretation of the sexagesimal digits is
, which
is amazingly close to the true value, considering that
YBC 7289 has been dated to around 1600 BC.
What we are going to do with this ancient clay tablet is to treat it as information that needs to be stored electronically.
The choice of a clay tablet for recording the information on YBC 7289 was obviously a good one in terms of the durability of the storage medium. Very few electronic media today have an expected lifetime of several thousand years. However, electronic media do have many other advantages.
The most obvious advantage of an electronic medium is that it is very easy to make copies. The curators in charge of YBC 7289 would no doubt love to be able to make identical copies of such a precious artifact, but truly identical copies are only really possible for electronic information.
This leads us to the problem of how we produce an electronic record of the tablet YBC 7289. We will consider a number of possibilities in order to introduce some of the issues that will be important when discussing various data storage alternatives throughout this chapter.
A straightforward approach to storing the information on this tablet would be to write a simple textual description of the tablet.
YBC 7289 is a clay tablet with various cuneiform marks on it that describe the relationship between the length of the diagonal and the length of the sides of a square.
This approach has the advantages that it is easy to create and it is easy for a human to access the information. However, when we store electronic information, we should also be concerned about whether the information is easily accessible for computer software. This essentially means that we should supply clear labels so that individual pieces of information can be retrieved easily. For example, the label of the tablet is something that might be used to identify this tablet from all other cuneiform artifacts, so the label information should be clearly identified.
label: YBC 7289 description: A clay tablet with various cuneiform marks on it that describe the relationship between the length of the diagonal and the length of the sides of a square.
Thinking about what sorts of questions will be asked of the data is a good way to guide the design of data storage. Another sort of information that people might go looking for is the set of cuneiform markings that occur on the tablet.
The markings on the tablet are numbers, but they are also symbols, so it would probably be best to record both numeric and textual representations. There are three sets of markings and three values to record for each set; a common way to record this sort of information is with a row of information per set of markings, with three columns of values on each row.
<<< 30 30 | <<|||| <<<<<| < 1 24 51 10 1.41421296 <<<<|| <<||||| <<<||||| 42 25 35 42.4263889
When storing the lines of symbols and numbers, we have spaced out the information so that it is easy, for a human, to see where one sort of value ends and another begins. Again, this information is even more important for the computer. Another option is to use a special character, such as a comma, to indicate the start/end of separate values.
values: cuneiform,sexagesimal,decimal <<<,30,30 | <<|||| <<<<<| <,1 24 51 10,1.41421296 <<<<|| <<||||| <<<|||||,42 25 35,42.4263889
Something else we should add is information about how the values relate to each other. Someone who is unfamiliar with Babylonian history may have difficulty realizing how the three values on each line actually correspond to each other. This sort of encoding information is essential metadata--information about the data values.
encoding: In cuneiform, a '<' stands for 10 and a '|' stands for 1. Sexagesimal values are base 60, with a sexagesimal point after the first digit; the first digit represents ones, the second digit is sixtieths, the third is three-thousand six-hundredths, and the fourth is two hundred and sixteen thousandths.
The position of the markings on the tablet and the fact that there is also a square, with its diagonals inscribed, are all important information that contribute to a full understanding of the tablet. The best way to capture this information is with a photograph.
In many fields, data consist not just of numbers, but also pictures, sounds, and video. This sort of information creates additional files that are not easily incorporated together with textual or numerical data. The problem becomes not only how to store each individual representation of the information, but also how to organize the information in a sensible way. Something that we could do in this case is include a reference to a file containing a photograph of the tablet.
photo: ybc7289.png
Information about the source of the data may also be of interest. For example, the tablet has been dated to sometime between 1800 BC and 1600 BC. Little is known of its rediscovery, except that it was acquired in 1912 AD by an agent of J. P. Morgan, who subsequently bequeathed it to Yale University. This sort of metadata is easy to record as a textual description.
medium: clay tablet history: Created between 1800 BC and 1600 BC, purchased by J.P. Morgan 1912, bequeathed to Yale University.
The YBC in the tablet's label stands for the Yale Babylonian Collection. This tablet is just one item within one of the largest collections of cuneiforms in the world. In other words, there are a lot of other sources of data very similar to this one.
This has several implications for how we should store information about YBC 7298. First of all, we should store the information about this tablet in the same way that information is stored for other tablets in the collection so that, for example, a researcher can search for all tablets created in a certain time period. We should also think about the fact that some of the information that we have stored for YBC 7289 is very likely to be in common with all items in the collection. For example, the explanation of the sexagesimal system will be the same for other tablets from the same era. With this in mind, it does not make sense to record the encoding information for every single tablet. It would make sense to record the encoding information once, perhaps in a separate file, and just refer to the appropriate encoding information within the record for an individual tablet.
A complete version of the information that we have recorded so far might look like this:
label: YBC 7289 description: A clay tablet with various cuneiform marks on it that describe the relationship between the length of the diagonal and the length of the sides of a square. photo: ybc7289.png medium: clay tablet history: Created between 1800 BC and 1600 BC, purchased by J.P. Morgan 1912, bequeathed to Yale University. encoding: sexagesimal.txt values: cuneiform,sexagesimal,decimal <<<,30,30 | <<|||| <<<<<| <,1 24 51 10,1.41421296 <<<<|| <<||||| <<<|||||,42 25 35,42.4263889
Is this the best possible way to store information about YBC 7289? Almost certainly not. Some problems with this approach include the fact that storing information as text is often not the most efficient approach and the fact that it would be difficult and slow for a computer to extract individual pieces of information from a free-form text format like this. However, the choice of an appropriate format also depends on how the data will be used.
The options discussed so far have only considered a couple of the possible text representations of the data. Another whole set of options to consider is binary formats. For example, the photograph and the text and numeric information could all be included in a single file. The most likely solution in practice is that this information resides in a relational database of information that describes the entire Yale Babylonian Collection.
This chapter will look at the decisions involved in choosing a format for storing information, we will discuss a number of standard data storage formats, and we will acquire the technical knowledge to be able to work with the different formats.
We start in Section 5.2 with plain text formats. This is followed by a discussion of binary formats in Section 5.3, and in Section 5.4, we look at the special case of spreadsheets. In Section 5.5, we look at XML, a computer language for storing data, and in Section 5.6, we discuss relational databases.
Paul Murrell
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 New Zealand License.