Having stored information in a particular data format, how do we get it back out again? How easy is it to access the data? The answer naturally depends on which data format we are dealing with.
For data stored in plain text files, it is very easy to find software that can read the files, although the software may have to be provided with additional information about the structure of the files--where the data values reside within the file--plus information about what sorts of values are stored in the file--whether the data are, for example, numbers or text.
For data stored in binary files, the main problem is finding software that is designed to read the specific binary format. Having done that, the software does all of the work of extracting the appropriate data values. This is an all or nothing scenario; we either have software to read the file, in which case data extraction is trivial, or we do not have the software, in which case we can do nothing. This scenario includes most data that are stored in spreadsheets, though in that case the likelihood of having appropriate software is much higher.
Another factor that determines the level of difficulty involved in retrieving data from storage is the structure of the data within the data format.
Data that are stored in plain text files, spreadsheets, or binary formats typically have a straightforward structure. For example, all of the values in a single variable within a data set are typically stored in a single column of a text file or spreadsheet, or within a single block of memory within a binary file.
By contrast, data that have been stored in an XML document or in a relational database can have a much more complex structure. Within XML files, the data from a single variable may be represented as attributes spread across several different elements, and data that are stored in a database may be spread across several tables.
This means that it is not necessarily straightforward to extract data from an XML document or a relational database. Fortunately, this is offset by the fact that sophisticated technologies exist to support data queries with relational databases and XML documents.
How this chapter is organized
To begin with, we will look at a simple example of data retrieval from a database. As with previous introductory examples, the focus at this point is not so much on the computer code itself as it is on the concepts involved and what sorts of tasks we are able to perform.
The main focus of this chapter is Section 7.2 on the Structured Query Language (SQL), the language for extracting information from relational databases. We will also touch briefly on XPath for extracting information from XML documents in Section 7.3.
Paul Murrell
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 New Zealand License.