|Media Studies 110 Electronic
Coursebook for Students
|Media Studies Program, University of Virginia
12. Databases and Information Storage & Retrieval
are the chief contemporary means for storing, managing and retrieving information.
Databases determine the forms in which information is stored as well as the
methods by which it can be accessed. In order to be stored in a database, information
must be formalized to a greater or lesser degree; the particular ways in which
a given database is constructed often determine much about the picture of the
world the database can ultimately present. Databases are important to
almost every software application, and in many of the most critical applications—those
that are concerned with the administration and structure of society—databases
sometimes play unexpectedly large parts.
Conceptually, the systematic organization of information extends
back at least to the French Encyclopedists of the 17th Century, and is the
Memex. What these systems all share is a concern with the way in which information
can be organized, searched, and displayed. The evolution of digital databases
is one where ever-more complex relationships can be established between data,
but also one in which the end results presented to a user are thoroughly mediated.
Contemporary databases in some ways fulfill the dream of the Memex, but
at the same time raise the question of whether we have gained a better understanding
of the world, or one which builds in tendentious claims about the informational
structure of human society.
Abstractly, a database is a collection of data organized in some way so that
desired items or information can be quickly retrieved according to a set of
In practice, databases refer to a specific set of software models that have been
developed and deployed by a small number of corporations and that today fuel
almost all data-oriented applications (especially websites).
Broadly speaking, a database is a representation of a given range of knowledge
that is arranged in some kind of order. According to this definition, examples
of databases include:
- a dictionary
- an encyclopedia
- a library card file
- a map (topographical, statistical, etc.)
- a collection of punch cards on which data has been encoded (e.g. the 1890
- a contemporary set of computer records and the associated parts by which they can be organized, managed, and searched
All databases have the following features to some extent,
but they are especially
apparent in software databases.
- Raw data is organized into fields
- A record is made up of one or more
- Database includes an (inherent or explicit) index which organizes
the data based on key fields
- The data need not have inherent organization, or the organization can
be ad-hoc (such as the alphabetical order of the encyclopedia) or
contentful (such as the Library of Congress indexing system, where
the call number has a meaning relating to the subject of the entry)
- For example, a simple Virgo record:
This is not a comprehensive listing of the Virgo record for this
book, but instead a selection of fields chosen for display.
- Virgo contains several indexed fields, including the Call Number
and the Subject and Related Name fields
- Most databases use a unique key structure, in which
one field is guaranteed to be unique for each record; this field
in typical end-user applications such as the Virgo display.
typically serve as surrogates for the actual things they relate to:
relationship between the Virgo entry and the physical book
on the shelf
- The relationship between your records in UVa administrative
databases (registrar, housing, financial) and you
Indexing, categorization and searching
- Indexes and tables of contents (two different approaches to presenting
guides to the database) are created by compiling some of the important (indexed)
fields, often ones which
- Some of these choices will be very clear from the underlying data. In Virgo,
some of the fields that will definitely be indexed include not just Call
Number and the Subject and Related Name fields, but also Author, Keywords,
- In software development, an indexed field refers to a field that is important
enough that guide to the data is created an application level to speed up
- When displaying the data, there are two important methods
by which users look for the records they want
- Searching, in which the indexed fields are typically searched by entering
a key term and matching against a field in the database (for example,
searching Virgo with TITLE="information
- Indexing, in which the structure and some contents of key fields are
displayed in a tree-like or hierarchical fashion to allow browsing (for
example, the Yahoo! and Google web
directories, each of which categorizes websites as if they were records
in a database)
- The record(s) can then be retrieved and displayed, formatted, or processed
by other applications for still further use in software. Result and display
choices are purposely very flexible, so that only a piece or pieces of various
records need be returned at any given time.
- A database is fundamentally just a kind of file. Most of the time, this file
is occupying a critical place in a software application—it will be the
data on which the program will perform operations. Any file that has a series
of labels (fields) and data (records) can be used as a database.
- This simplest type of database is a flat file: all database
information is contained in one file or table
- Common formats include "comma-separated values" (CSV), "tab-separated
- Often stored as text files (simple files with .txt extension); can be
created/edited in spreadsheets
- people.txt file, header row and two data rows, in CSV format
LastName, FirstName, Address, PhoneNo
Smith, Bill, 1445 Oak Street, 800-555-1212
Wallace, William, 1448 Thompson Street, 888-555-1414
- There is no relationship
assumed between fields
- Data is simply "extracted"
- The earliest form of database,
also inherent in the spreadsheet model (VisiCalc, Excel)
- Can be very fast, still used in many applications
- A more sophisticated DBMS model is the hierarchical database,
in which there is a predetermined, hierarchical (tree-like) relationship between
all the elements of the database.
- For example, there could
be a master customer database, and a file of orders that are attached
directly to the customer records in a many-to-one relationship (there can
orders for one customer).
- An early attempt to create more complex data relationships. Developed
in 1960s, was not widely adopted.
- The most prominent model took the idea of relationships between data much
more seriously, and is named after this primary concern: this is the relational database or RDBMS.
- Work done at IBM in the late 1960s by Edgar F. Codd, "A
Relational Model of Data for Large Shared Data Banks" (ACM, 1969)
- Codd is primarily concerned to abstract the data storage and manipulation
models away from the machine hardware and operating software, but the
import of his work is not immediately recognized
- Two primary models eventually built form Codd's work: Ingres at UC-Berkeley
and System R at IBM. Each of these works not just at a software system
but at a standard for storing data and querying databases.
- The work done under these two umbrella projects has led directly to current
products and platforms, include Sybase, Oracle, Microsoft Access, and the
current standard for querying known as Structured Query Language (SQL).
- A relational database can be thought of effectively as a collection of
many simple DBMS tables, each with its own set of field labels and fields;
one or more of these tables is used to collect information about the other
tables and to establish relationships between them.
- These relationships can be many-to-one as in the hierarchical model,
but they can also be one-to-one, one-to-many, and many-to-many. This
allows many different kinds of data to be stored.
- ISIS is a good example. ISIS maintains at least two tables:
a table of students (which includes fields like names, class year,
ID, and many entries for the courses a student is enrolled in); and
of courses which includes course name, course number, and many entries
for students in the course. These two tables exist in a many-to-many
relationship with each other.
example of a bookstore application RDBMS that includes
many tables that bear different types of relationship to each other.
- RDBMS models enable sophisticated searching, retrieval and display
- More contemporary DBMS models are emerging, especially ones that take advantage
of semantic programming developments such as XML, and of object-oriented technologies.
Many contemporary databases are described as being OODBMSs because of their
deep integration of data objects into the programmatic infrastructure.
Databases and their uses
- Today, nearly all software applications rely on databases. This is especially
true of websites that use ecommerce. Amazon.com, for example, relies on databases
in many ways, not least for databases of books, but also for databases of customers.
- Most dominant software companies today have a DBMS as part of their main
packaged offering; Oracle, Sybase, SAP, Baan, IBM, Computer Associates, Siebel,
and many other companies portray themselves as database vendors despite selling
products that go far beyond what is typically thought of as a database
- All of these software packages do not simply represent data but provide
explicit means for the administration of social structure with regard to
the relevant domain in which the software is installed
- ISIS, for example, is used to administer the entire range of course
offerings at UVa, and there is a constant negotiation between the data
of ISIS and real-world course requirements
- Like other such applications, ISIS provides many benefits, but is on
its face less flexible than the non-computerized systems it replaces
- Of particular concern are not so much databases with explicit formalizations
accessible to the user like ISIS, but instead databases whose formalizations
are kept proprietary from end-users for systematic or commercial reasons
- Contemporary software models such as Enterprise
Resource Planning (ERP)
and Customer Relationship Management (CRM)
are the main products sold by companies like Oracle, Sybase,
- These software models are explicitly dedicated to formalizing social structure
for the benefit of the purchasing customer.
- In the case of ERP, the formalized
objects are generally things like product inventories, allowing the
company managers to create sophisticated economic models like just-in-time
- In the case of CRM and other similar models, the formalized objects
are people, viewed as customers of the company, and the dta products
are often specific behavioral scripts generated to increase profitability
by the statistical manipulation of behavior.
- For example, most customer service interaction with companies is
today dictated by data modeling, complete with performance scripts
for the machines and human operators involved, and very often the
statistical motivation for certain behavior choices is deliberately
obscured from the customer
- All such systems interact with corporate financial databases, encouraging
not merely the formalization but the quantification of social "objects"
- The knowledge obtained by using this software is often kept proprietary
(as in the case of the elaborate credit scoring systems used for
all aspects of business and developed by Fair
Isaac Corporation) and obtained
or created by means of computing-intensive "data mining" techniques
- These systems raise questions of privacy and security
and especially of accuracy, since the information contained
in such databases is often not made available for checking, and in fact
can't be coherently checked outside the context of the software applications
for which it has been generated; they also raise profound questions that
are the same ones preoccupying Wiener, Papert, and Winner)
- What are the basic elements of a database?
- What is the difference between types of databases (flat, hierarchical, relational?)
- How do differences in types of databases allow information to be generated in different ways?
- What are different methods for searching databases?
- What are the consequences of these different methods for the user's understanding of the search results?
- How would you relate databases to such concepts as entropy, information,
and order from information theory?
- How do present-day databases and information retrieval systems relate to
Bush's concept of the Memex?
- In mathesis universalis, symbolic logic was the core concept, and the goal
was to create computing machines; with information theory, the core concept
is the selection of the right message, the separation of signal from noise,
with the goal being machines that can more effectively retrieve information.
What significance, if any, would you ascribe to this shift from mathesis
universalis to information theory, and the accompanying shifts in the desired
function of computers?
- In many ways, schemes for information storage like the one described by
Ted Nelson in today's reading ("A File Structure for the Complex...") describe
not the storage of information so much as the various ways in which a researcher
might make connections between it. In what ways does contemporary information
storage fulfill or not fulfill Nelson's goals, for example in a database-centered
application like Amazon or Google?
- Norbert Wiener's writings on cybernetics are especially concerned with
the potential misuses of control technologies, especially when these uses
by government or industry. Can you think of a way that technologies like
ERP and CRM could be used according to the social principles Wiener espoused?
READING FOR WEDNESDAY
- Shneiderman, "Direct Manipulation: A Step Beyond Programming Languages"