The process of normalization is a progressive procedure used in constructing an efficient relational database. When you are building your database, the primary goals are to minimize redundancy, save disk space and ensure data integrity. Understanding how the normalization process works will be beneficial to your final product when building your relational database.
When you create your database, you'll find different types of information will need to be tracked and to do this, you'll have to develop a table for each object. As you design your tables, each one will represent objects and the purpose is to try and minimize storing the same data in more than one place. If you have duplicate data, it will impact the optimization and performance of your database.
As you design your database, creating proper tables and correctly identifying the relationships between them is crucial. If this is phase not done correctly, it will likely cause problems with your database in the future. You'll want to ask many questions at this stage and sketch out a structured plan to determine what classes of information belong in each table and how they connect to other classes of information. For instance one object may be "customers" which will be placed in a table and "orders" (another object) will have its own separate table. When you diagram your database, you'll want to explore how each object shares a relationship with the others.
When going through the normalization process, there are several levels of normalization that are referred to as various "forms", and they are numbered 1-5.
• Normalization in the first form
This form requires any duplication of columns in the same table to be removed. Then separate tables will be developed for each entity of related information and each identified with a primary key. A primary key is a unique identifier for each record, creating a column for your primary key helps eliminate redundancy of entities within a table (i.e. "customer number" is a good choice for a primary key). If a table has "no repeating groups, it is said to be in first normal form (1NF)" (Post, pg 89).
• Normalization in the second form
The second form takes normalization to a level higher and has its' own set of requirements. These rules declare that the table must meet the requirements of the first normalization form and then any subsets of data that are in multiple rows are to be removed. "A table is in second normal form (2NF) if every non-key column depends on the entire key and not part of it" (Post, pg 92). It's important to determine how information is related. Are the relationships one-to-one, one-to-many or many-to-many? (i.e. one supplier to multiple restaurants or one school to many students or many cars to many dealerships). Once this is figured out, your data is ready to placed in separate tables and you can establish relationship between them. You'll then apply what is known as a foreign key. This foreign key distinguishes a column that exists as a primary key in another table. It also establishes the relationship between the two tables and helps to enforce referential integrity of your data.
• Normalization in the third form
The third form takes it one more level and its requirements are that it meets the second form and then subsequently removes all columns that are not dependent upon the primary key, further eliminating the chance of any redundancy within tables. Each level of normalization gets more restrictive. There are also a fourth and fifth level, but the most common form used is 3NF.
If you can grasp a good understanding of the normalization process when building your database, you'll find that it contains better data integrity, little redundancy and search indexes are faster. Your relationships between objects will be defined your database will have improved efficiency.
• Hillyer, Mike, "An Introduction to Database Normalization"
• Post, Gerald, "Database Management Systems", 3rd Edition, 2005