A relational database is one which appears to be nothing more than a collection of tables. The model it is based on has three major aspects: the structure, the manipulation, and the integrity of data.

In the relational model, data are logically presented in two- dimensional tables made up of columns and rows. The rows of a table consist of a collection of values that describe an entity; for example, an employee. The columns of a table consist of a collection of similar data among rows; e.g., employee surnames. Operations on the data are simplified by the fact that a table's rows are unordered. The intersection of a row and a column contains individual data items called values. Values are always atomic; that is, each position in a table may contain only one datum.

Data manipulation is provided by a set of algebraic or calculus operators.

Data integrity is provided by two rules. Entity integrity requires that every value in the primary key of a table must be a unique, non-null data value. Referential integrity requires that every value in a foreign key must either equal some value of its related primary key, or it must be NULL.

Design of a Relational Database

One of the main advantages of the relational model of database design is that it is based on a foundation of formal mathematical theory that allows its concepts to be defined and examined with great precision.

Remember that a relation is (essentially) a two-dimensional table consisting of horizontal rows and vertical columns. The advantage of this form is that almost everyone is familiar with data presented as a simple table. The relational model stipulates that no two rows in the table (relation) may be identical; there must be some combination of columns, called a key, whose values will uniquely identify each row.

As an example of the model's capabilities, we will design a structure for a personnel database which contains the following information: the name of each employee, the programming languages the employee is familiar with, the number of years the employee has used each language, the employee's title, the employee's length of service with the company, the employee's hourly rate of pay, the current projects to which the employee is assigned, and the manager of each of these projects.

The following PERSONNEL_1 table shown below shows a sample of the data that might be stored in such a database, along with one possible structure.

PERSONNEL_1

NAMELANGYRS_USETITLEYRS_EXPPAYPROJECTMGR
MarvinCobol3Sr. Prog.425.00PayrollSmith
Fortran2A/RJones
BrownCobol2Sr. Prog.324.00InventoryNorman
Basic1
Ada3
NormanCobol4Prj. Mgr.235.00InventoryNorman
SQL2
JamesSQL1Sys. Ana.229.00A/RJones
Pascal3DatcommHarvey
JonesCobol1Prj. Mgr.842.00A/RJones
Pascal5
SQL2
Basic9

Each employee in this table has a unique name, so NAME may be used as the table's key. In practice, of course, there may be more than one way to construct a key - social security numbers or employee numbers are other values that might be used to uniquely identify an employee. Using this structure, any request of the form, "Tell me something about employee E", is easily answered. However, it isn't as simple to respond to such requests as "Which employees can use language L?"; "Who is the manager of project P?"; "Display all employees assigned to project P"; or "Change the manager of project P to employee E". But through a process known as normalization, the organization of the data in PERSONNEL_1 can be changed so that it can be used more flexibly.

The first step in normalizing the design is based on the relational rule that each column of a table may take on only a single, non-repeating (atomic) value for each row of the table. Looking at PERSONNEL_1, it's easy to see that the columns LANG, YRS_USE, PROJECT and MGR violate this rule, because an employee may know more than one programming language and may be assigned to more than one project at a time. By duplicating the non-repeating values of NAME, TITLE, YRS_EXP, and PAY for each combination of values for the repeating groups, the entire table can be represented in first normal form. The PERSONNEL_2 table shown below is in first normal form.

PERSONNEL_2

NAMELANGYRS_USETITLEYRS_EXPPAYPROJECTMGR
MarvinCobol3Sr. Prog.425.00PayrollSmith
MarvinFortran2Sr. Prog.425.00A/RJones
BrownCobol2Sr. Prog.324.00InventoryNorman
BrownBasic1Sr. Prog.324.00InventoryNorman
BrownAda3Sr. Prog.324.00InventoryNorman
NormanCobol4Prj. Mgr.235.00InventoryNorman
NormanSQL2Prj. Mgr.235.00InventoryNorman
JamesSQL1Sys. Ani.229.00A/RJones
JamesPascal3Sys. Ani.229.00DatcommHarvey
JonesCobol1Prj. Mgr.842.00A/RJones
JonesPascal5Prj. Mgr.842.00A/RJones
JonesSQL2Prj. Mgr.842.00A/RJones
JonesBasic9Prj. Mgr.842.00A/RJones

We can now see that NAME is no longer sufficient to uniquely identify a row of PERSONNEL_2, because multiple rows may be present for an employee who knows more than one language or is assigned to more than one project. One solution is to create a new key from a combination of columns. NAME, LANG, and PROJECT combined would be such a key, as those three values together uniquely identify a single row.

PERSONNEL_2 appears to be a step backward in our design. Not only does it require more space to present the data, but responding to requests such as, "Change employee E's title to T"; "Add the assignment of employee E to project P"; or "Make employee E the manager of project P" is now more difficult. This problem is addressed by the remaining normalization steps, which are based on the concept of dependence and the relational rule that in every row of a table, each column must be dependent on every part of the key.

If, for each row, the value of a column C1 uniquely determines the value of a column C2, then C2 is functionally dependent on C1. If the value in C1 limits the possible values in C2 to a specific set, then C2 is set dependent on C1. For example, because each employee has only one title, we may say that NAME determines TITLE and that TITLE is functionally dependent on NAME. PROJECT is set dependent on NAME, since each employee is assigned to a specific set of projects.

The columns TITLE, YRS_EXP and PAY are not dependent on the entire key (NAME, LANG, PROJECT) of PERSONNEL_2 -- they are dependent on NAME alone. To solve this, we must create a new table containing only NAME, TITLE, YRS_EXP, and PAY. The key for this table, called EMPLOYEES, will be NAME. Of the remaining columns, YRS_USE is determined by both NAME and LANG and therefore cannot be part of EMPLOYEES. Another table, called LANGUAGES, must be formed by these three columns. LANGUAGES will have a key formed by a combination of the columns NAME and LANG. Because the table contains the NAME column as well, it is still possible to associate an employee's language experience with his employee data.

Splitting a table in this way prevents the experience part of the database from having columns that are dependent on only part of the table's key. A first normal form relation (atomic values in each portion of the table) that also has no partial key dependence is said to be in second normal form. The following tables, EMPLOYEES and LANGUAGES, are in second normal form.

EMPLOYEES

NAMETITLEYRS_EXPPAY
MarvinSr. Prog.425.00
BrownSr. Prog.324.00
NormanPrj. Mgr.235.00
JamesSys. Ani.229.00
JonesPrj. Mgr.842.00

LANGUAGES

NAMELANGYRS_USE
MarvinCobol3
MarvinFortran2
BrownCobol2
BrownBasic1
BrownAda3
NormanCobol4
NormanSQL2
JamesSQL1
JamesPascal3
JonesCobol1
JonesPascal5
JonesSQL2
JonesBasic9

The situation with project assignments is slightly different. We have already noted that an employee name determines the set of projects on which that employee works. This is independent of the languages used by the employee. This means that a table containing PROJECT should not have LANG in its key. However, the project name uniquely determines the project manager.

MGR}}} is transitively dependent on {{{NAME}}}, because {{{NAME

determines a set of values for PROJECT, and PROJECT functionally determines MGR. To complete our design, we should remove any transitive dependencies, according to the relational rule that in every row of a table, all columns must depend directly on the key, without any transitive dependencies through other columns. A second normal form relation that has no transitive dependence is said to be in third normal form.

Because each project has only one manager, we can form a PROJECTS table with the columns PROJECT and MGR. PROJECTS' key will be PROJECT. Note that MGR could also be a key, if each employee managed only one project. Finally, since each employee works on one or more projects, we will create a fourth table, called ASSIGNMENTS, using the columns NAME and PROJECT. This table forms the association between the EMPLOYEES and PROJECTS tables and is "all key" - i.e., it has no additional dependent columns, because the only thing dependent on both NAME and PROJECT is the fact that they are associated. Here are the third normal form tables.

PROJECTS

PROJECTMGR
PayrollSmith
A/RJones
InventoryNorman
DatcommHarvey

ASSIGNMENTS

NAMEPROJECT
MarvinPayroll
MarvinA/R
BrownInventory
NormanInventory
JamesA/R
JamesDatcomm
JonesA/R

At this point, our design is complete. All tables are in third normal form, and requests such as those listed earlier can easily be dealt with.

Here are some tips for good database design:

  • Don't use an existing database as the basis for a new database structure -- you don't want to inadvertently duplicate awkward or inconsistent table definitions.
  • Make sure that each table represents just one subject -- that is, either one object or one event. This avoids unnecessary duplication of data.
  • Define a primary key for every table -- not only will it uniquely identify a row value, you'll use it to join tables. A primary key should have these characteristics: its value must be unique and "known not nullable", and its value should consist of the minimum number of columns to guarantee uniqueness.
  • Don't define any multi-value columns -- that is, don't use the ROW or ARRAY <data type>s.
  • Implement data integrity; define unique keys and foreign keys for your tables.

Note:

Portions of the text in this entry are Copyright © 1999 by Ocelot Computer Services Incorporated. Used by permission.

Comments

Comments loading...