normal forns: 2007

FIRST NORMAL FORM

FIRST NORMAL FORM (1NF or Minimal Form) is a normal form used in database normalization. A relational database table that adheres to 1NF is one that meets a certain minimum set of criteria. These criteria are basically concerned with ensuring that the table is a faithful representation of a relation[1] and that it is free of repeating groups[2].

The concept of a "repeating group" is, however, understood in different ways by different theorists. As a consequence, there is not universal agreement as to which features would disqualify a table from being in 1NF. Most notably, 1NF as defined by some authors (for example, Ramez Elmasri and Shamkant B. Navathe[3], following the precedent established by E.F. Codd) excludes relation-valued attributes (tables within tables); whereas 1NF as defined by other authors (for example, Chris Date) permits them.

According to Date's definition of 1NF, a table is in 1NF if and only if it is "isomorphic to some relation", which means, specifically, that it satisfies the following five conditions:1. There's no top-to-bottom ordering to the rows.

2. There's no left-to-right ordering to the columns.

3. There are no duplicate rows.

4. Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else).

5. All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps].

—Chris Date, "What First Normal Form Really Means", pp. 127-8[4]

Violation of any of these conditions would mean that the table is not strictly relational, and therefore that it is not in 1NF.

Examples of tables (or views) that would not meet this definition of 1NF are:

A table that lacks a unique key. Such a table would be able to accommodate duplicate rows, in violation of condition 3.

A view whose definition mandates that results be returned in a particular order, so that the row-ordering is an intrinsic and meaningful aspect of the view.[5] This violates condition 1. The tuples in true relations are not ordered with respect to each other.

A table with at least one nullable attribute. A nullable attribute would be in violation of condition 4, which requires every field to contain exactly one value from its column's domain. It should be noted, however, that this aspect of condition 4 is controversial. It marks an important departure from Codd's original vision of the relational model, which made explicit provision for nulls.[6]

Repeating groups

Date's fourth condition, which expresses "what most people think of as the defining feature of 1NF"[7], is concerned with repeating groups. The following example illustrates how a database design might incorporate repeating groups, in violation of 1NF.

Example 1: Domains and values

Suppose a novice designer wishes to record the names and telephone numbers of customers. He defines a customer table which looks like this:

CustomerCustomer ID First Name Surname Telephone Number

123 Rachel Ingram 555-861-2025

456 James Wright 555-403-1659

789 Maria Fernandez 555-808-9633

The designer then becomes aware of a requirement to record multiple telephone numbers for some customers. He reasons that the simplest way of doing this is to allow the "Telephone Number" field in any given record to contain more than one value:

CustomerCustomer ID First Name Surname Telephone Number

123 Rachel Ingram 555-861-2025

456 James Wright 555-403-1659

555-776-4100

789 Maria Fernandez 555-808-9633

Assuming, however, that the Telephone Number column is defined on some Telephone Number-like domain (e.g. the domain of strings 12 characters in length), the representation above is not in 1NF. 1NF (and, for that matter, the RDBMS) prohibits a field from containing more than one value from its column's domain.

Example 2: Repeating groups across columns

The designer might attempt to get around this restriction by defining multiple Telephone Number columns:

CustomerCustomer ID First Name Surname Tel. No. 1 Tel. No. 2 Tel. No. 3

123 Rachel Ingram 555-861-2025

456 James Wright 555-403-1659 555-776-4100

789 Maria Fernandez 555-808-9633

This representation, however, makes use of nullable columns, and therefore does not conform to Date's definition of 1NF. Even if the view is taken that nullable columns are allowed, the design is not in keeping with the spirit of 1NF. Tel. No. 1, Tel. No. 2., and Tel. No. 3. share exactly the same domain and exactly the same meaning; the splitting of Telephone Number into three headings is artificial and causes logical problems. These problems include:

Difficulty in querying the table. Answering such questions as "Which customers have telephone number X?" and "Which pairs of customers share a telephone number?" is awkward.

Inability to enforce uniqueness of Customer-to-Telephone Number links through the RDBMS. Customer 789 might mistakenly be given a Tel. No. 2 value that is exactly the same as her Tel. No. 1 value.

Restriction of the number of telephone numbers per customer to three. If a customer with four telephone numbers comes along, we are constrained to record only three and leave the fourth unrecorded. This means that the database design is imposing constraints on the business process, rather than (as should ideally be the case) vice-versa.

Example 3: Repeating groups within columns

The designer might, alternatively, retain the single Telephone Number column but alter its domain, making it a string of sufficient length to accommodate multiple telephone numbers:

CustomerCustomer ID First Name Surname Telephone Number

123 Rachel Ingram 555-861-2025

456 James Wright 555-403-1659, 555-776-4100

789 Maria Fernandez 555-808-9633

This is arguably the worst design of all, and again not in keeping with the spirit of 1NF. The Telephone Number heading becomes semantically woolly, as it can now represent either a telephone number, a list of telephone numbers, or indeed anything at all. A query such as "Which pairs of customers share a telephone number?" is virtually impossible to formulate, given the necessity to cater for lists of telephone numbers as well as individual telephone numbers. Meaningful constraints on telephone numbers are also impossible to define in the RDBMS with this design.

A design that complies with 1NF

A design that is unambiguously in 1NF makes use of two tables: a Customer table and a Customer Telephone Number table.

CustomerCustomer ID First Name Surname

123 Rachel Ingram

456 James Wright

789 Maria Fernandez

Customer Telephone NumberCustomer ID Telephone Number

123 555-861-2025

456 555-403-1659

456 555-776-4100

789 555-808-9633

Repeating groups of telephone numbers do not occur in this design. Instead, each Customer-to-Telephone Number link appears on its own record.

Atomicity

Some definitions of 1NF, most notably that of E.F. Codd, make reference to the concept of atomicity. Codd states that the "values in the domains on which each relation is defined are required to be atomic with respect to the DBMS."[8] Codd defines an atomic value as one that "cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions)."

Hugh Darwen and Chris Date have suggested that Codd's concept of an "atomic value" is ambiguous, and that this ambiguity has led to widespread confusion about how 1NF should be understood. In particular, the notion of a "value that cannot be decomposed" is problematic, as it would seem to imply that few, if any, data types are atomic:

A character string would seem not be atomic, as the RDBMS typically provides operators to decompose it into substrings.

A date would seem not to be atomic, as the RDBMS typically provides operators to decompose it into day, month, and year components.

A fixed-point number would seem not to be atomic, as the RDBMS typically provides operators to decompose it into integer and fractional components.

Date suggests that "the notion of atomicity has no absolute meaning": a value may be considered atomic for some purposes, but may be considered an assemblage of more basic elements for other purposes. If this position is accepted, 1NF cannot be defined with reference to atomicity. Columns of any conceivable data type (from string types and numeric types to array types and table types) are then acceptable in a 1NF table—although perhaps not always desirable. Date argues that relation-valued attributes, by means of which a field within a table can contain a table, are useful in rare cases.

SECOND NORMAL FORM (2NF)

Second normal form (2NF) is a normal form used in database normalization. 2NF was originally defined by E.F. Codd[1] in 1971. A table that is in first normal form (1NF) must meet additional criteria if it is to qualify for second normal form. Specifically: a 1NF table is in 2NF if and only if, given any candidate key and any attribute that is not a constituent of a candidate key, the non-key attribute depends upon the whole of the candidate key rather than just a part of it.

In slightly more formal terms: a 1NF table is in 2NF if and only if none of its non-prime attributes are functionally dependent on a part (proper subset) of a candidate key. (A non-prime attribute is one that does not belong to any candidate key.)

Note that when a 1NF table has no composite candidate keys (candidate keys consisting of more than one attribute), the table is automatically in 2NF.

Example

Consider a table describing employees' skills:

Employees' SkillsEmployee Skill Current Work Location

Jones Typing 114 Main Street

Jones Shorthand 114 Main Street

Jones Whittling 114 Main Street

Roberts Light Cleaning 73 Industrial Way

Ellis Alchemy 73 Industrial Way

Ellis Juggling 73 Industrial Way

Harrison Light Cleaning 73 Industrial Way

The table's only candidate key is {Employee, Skill}.

The remaining attribute, Current Work Location, is dependent on only part of the candidate key, namely Employee. Therefore the table is not in 2NF. Note the redundancy in the way Current Work Locations are represented: we are told three times that Jones works at 114 Main Street, and twice that Ellis works at 73 Industrial Way. This redundancy makes the table vulnerable to update anomalies: it is, for example, possible to update Jones' work location on his "Typing" and "Shorthand" records and not update his "Whittling" record. The resulting data would imply contradictory answers to the question "What is Jones' current work location?"

A 2NF alternative to this design would represent the same information in two tables:

EmployeesEmployee Current Work Location

Jones 114 Main Street

Roberts 73 Industrial Way

Ellis 73 Industrial Way

Harrison 73 Industrial Way

Employees' SkillsEmployee Skill

Jones Typing

Jones Shorthand

Jones Whittling

Roberts Light Cleaning

Ellis Alchemy

Ellis Juggling

Harrison Light Cleaning

Update anomalies cannot occur in these tables, which are both in 2NF.

Not all 2NF tables are free from update anomalies, however. An example of a 2NF table which suffers from update anomalies is:

Tournament WinnersTournament Year Winner Winner Date of Birth

Des Moines Masters 1998 Chip Masterson 14 March 1977

Indiana Invitational 1998 Al Fredrickson 21 July 1975

Cleveland Open 1999 Bob Albertson 28 September 1968

Des Moines Masters 1999 Al Fredrickson 21 July 1975

Indiana Invitational 1999 Chip Masterson 14 March 1977

Even though Winner and Winner Date of Birth are determined by the whole key {Tournament, Year} and not part of it, particular Winner / Winner Date of Birth combinations are shown redundantly on multiple records. This problem is addressed by third normal form (3NF).

2NF and candidate keys

A table for which there are no partial functional dependencies on the primary key is typically, but not always, in 2NF. In addition to the primary key, the table may contain other candidate keys; it is necessary to establish that no non-prime attributes have part-key dependencies on any of these candidate keys.

Multiple candidate keys occur in the following table:

Electric Toothbrush ModelsManufacturer Model Model Full Name Manufacturer Country

Forte X-Prime Forte X-Prime Italy

Forte Ultraclean Forte Ultraclean Italy

Dent-o-Fresh EZBrush Dent-o-Fresh EZBrush USA

Kobayashi ST-60 Kobayashi ST-60 Japan

Hoch Toothmaster Hoch Toothmaster Germany

Hoch Contender Hoch Contender Germany

Even if the designer has specified the primary key as {Model Full Name}, the table is not in 2NF. {Manufacturer, Model} is also a candidate key, and Manufacturer Country is dependent on a proper subset of it: Manufacturer.

THE THIRD NORMAL FORM (3NF)

The third normal form (3NF) is a normal form used in database normalization. 3NF was originally defined by E.F. Codd[1] in 1971. Codd's definition states that a table is in 3NF if and only if both of the following conditions hold:

The table is in second normal form (2NF)

No non-prime attribute of the table is transitively dependent on a candidate key

A non-prime attribute is an attribute that does not belong to any candidate key. A transitive dependency is a functional dependency X → Z in which Z is not immediately dependent on X, but rather on a third set of attributes Y which depends on X. That is, X → Z by virtue of X → Y and Y → Z.

An alternative formulation of Codd's definition, given by Carlo Zaniolo[2] in 1982, is this: a table is in 3NF if and only if, for each of its functional dependencies X → A, at least one of the following conditions holds:

X contains A, or

X is a superkey, or

A is a prime attribute (i.e., A is contained within a candidate key)

Zaniolo's definition has the advantage of giving a clear sense of the difference between 3NF and the more stringent Boyce-Codd normal form (BCNF). BCNF simply eliminates the third alternative ("A is a prime attribute").

Example

An example of a 2NF table that fails to meet the requirements of 3NF is:

Tournament WinnersTournament Year Winner Winner Date of Birth

Indiana Invitational 1998 Al Fredrickson 21 July 1975

Cleveland Open 1999 Bob Albertson 28 September 1968

Des Moines Masters 1999 Al Fredrickson 21 July 1975

Indiana Invitational 1999 Chip Masterson 14 March 1977

The only candidate key is {Tournament, Year}.

The breach of 3NF occurs because the non-prime attribute Winner Date of Birth is transitively dependent on {Tournament, Year} via the non-prime attribute Winner. The fact that Winner Date of Birth is functionally dependent on Winner makes the table vulnerable to logical inconsistencies, as there is nothing to stop the same person from being shown with different dates of birth on different records.

In order to express the same facts without violating 3NF, it is necessary to split the table into two:

Tournament WinnersTournament Year Winner

Indiana Invitational 1998 Al Fredrickson

Cleveland Open 1999 Bob Albertson

Des Moines Masters 1999 Al Fredrickson

Indiana Invitational 1999 Chip Masterson

Player Dates of BirthPlayer Date of Birth

Chip Masterson 14 March 1977

Al Fredrickson 21 July 1975

Bob Albertson 28 September 1968

Update anomalies cannot occur in these tables, which are both in 3NF

BOYCE-CODD NORMAL FORM (OR BCNF)

Boyce-Codd normal form (or BCNF) is a normal form used in database normalization. It is a slightly stronger version of the third normal form (3NF). A table is in Boyce-Codd normal form if and only if, for every one of its non-trivial functional dependencies X → Y, X is a superkey—that is, X is either a candidate key or a superset thereof.

BCNF was developed in 1974 by Raymond F. Boyce and Edgar F. Codd[1] to address certain types of anomaly not dealt with by 3NF as originally defined. Chris Date has pointed out that a definition of what we now know as BCNF appeared in a paper[2] by Ian Heath in 1971. "Since that definition predated Boyce and Codd's own definition by some three years," writes Date, "it seems to me that BCNF ought by rights to be called Heath normal form. But it isn't."[3]

Example

Only in rare cases does a 3NF table not meet the requirements of BCNF. An example of such a table is:

Tutor/Student Cross-ReferenceTutor ID Tutor Soc. Security Num. Student ID

1078 088-51-0074 31850

1078 088-51-0074 37921

1293 096-77-4146 46224

1480 072-21-2223 31850

The purpose of the table is to show which tutors are assigned to which students. The table's candidate keys are:

{Tutor ID, Student ID}

{Tutor Social Security Number, Student ID}

Therefore all three attributes of the table are prime attributes: that is, all three attributes belong to candidate keys.

Recall that 2NF prohibits partial functional dependencies of non-prime attributes on candidate keys, and that 3NF prohibits transitive functional dependencies of non-prime attributes on candidate keys. Since the table above lacks any non-prime attributes, it adheres to both 2NF and 3NF.

BCNF is more stringent than 3NF in that it does not permit any functional dependency in which the determining set of attributes is not a candidate key (or superset thereof). The dependency of Tutor ID on Tutor Social Security Number is such a dependency. Accordingly, the table above is not in BCNF.

Any table that falls short of BCNF will be vulnerable to logical inconsistencies. In the table above, an inconsistent combination of Tutor ID and Tutor Social Security Number could be represented.

Correcting the problem in this case would be a simple matter of using only one scheme of identifiers for Tutors: either IDs or Social Security Numbers, but not both.

normal forns

Tuesday, November 6, 2007

Blog Archive

About Me