The goal of normalization is to minimize data redundancy

Normalization organizes columns (attributes) and tables (relations) of a database according to a set of normal form rules. These normal forms are what guide the normalization process, and can be viewed as a sort of check and balance system that maintains the integrity of dependencies between the attributes and relations. The normalization process aims to ensure, through a set of rules (normal forms), that if any data is updated, inserted, or deleted, the integrity of the database stays intact.

Most common types of keys

There are four most common types of keys:

  • Primary key is a single column that is used to recognize the table.
  • Composite key is several columns used to recognize the rows in the table.
  • Foreign key links the primary key that is in another table.
  • Candidate key is a particular field in a relational database.

So what exactly are normal forms?

Normal forms were first introduced in the 70s by Edgar F. Codd, as a part of a larger organizational model for the standardization of relational database structures. As previously mentioned, normal forms, at their core, reduce data redundancy and aim to create a database free from insertion, update, and deletion anomalies. Normal forms do this by singling out anomalies that undermine the dependencies between attributes and relations and editing them to fit a standardized format that satisfies sequential normal forms.

After years of advancement and refinement, data normalization has six normal forms, known as 6NF; however, most databases are considered normalized after the third stage of normalization, known as 3NF. Going further, we will focus on normal forms 1NF through 3NF, as they are the primary stages of normalization. It’s also important to note that normalization is a cumulative process. For instance, in order to move onto the second normal form (2NF), the first normal form (1NF) must be satisfied. With that said, let’s get started with normal forms.

First normal form (1NF)

The first normal form is the foundation of the rest of the normalization process. It is referred to as the primary key and involves minimizing attributes and relations, columns and tables respectively. To do this, one must first start by removing any duplicate data throughout the database. Removing duplicate data and satisfying the 1NF includes:

  • There is a primary key - no duplicate n values within a list or sequence.
  • No repeating groups.
  • Atomic columns - cells have a single value and each record is unique.

Second normal form (2NF)

Once 1NF is satisfied, one can move on to 2NF. The second normal form requires that subgroups of data that exist in multiple rows of tables are removed and represented in a new table with connections made between them. Essentially, all subsets of data that can exist in multiple rows should be put into separate tables. Once this is done relationships between the new tables (the subgroups of data that were rearranged) and new key labels can be created.

  • 1NF is satisfied.
  • Removes partial dependencies - relations (tables) with a primary key containing two or more attributes are relocated to a new table with new key labels created that correspond to a primary key.

Third normal form (3NF)

Following the logic of 2NF, the third normal form also requires that 1NF and 2NF are satisfied. 3NF states that no non-primary key attribute (column) should have transitive functional dependencies on the primary key. Therefore if the primary key is substituted, inserted, or deleted then all the data (that is transitively dependent upon that primary key), must be put into a new table.

  • 1NF and 2NF are satisfied.
  • There is no transitive dependency for non-primary attributes.

Fourth normal form (Boyce Codd Normal Form) and beyond

While normalizing your database in accordance with 4NF, 5NF, and 6NF, is recommended, most relational databases do not require more than 3NF to be satisfied to be considered normalized. The benefits of data normalization beyond 3NF don’t always cause significant errors when there are updates, deletions, or insertions of data. However, if your company utilizes complex datasets that get changed frequently, it is recommended that you also satisfy the remaining normal forms.

Normalization is the process of structuring and handling the relationship between data to minimize redundancy in the relational table and avoid the unnecessary anomalies properties from the database like insertion, update and delete. It helps to divide large database tables into smaller tables and make a relationship between them. It can remove the redundant data and ease to add, manipulate or delete table fields.

A normalization defines rules for the relational table as to whether it satisfies the normal form. A normal form is a process that evaluates each relation against defined criteria and removes the multivalued, joins, functional and trivial dependency from a relation. If any data is updated, deleted or inserted, it does not cause any problem for database tables and help to improve the relational table' integrity and efficiency.

Objective of Normalization

  1. It is used to remove the duplicate data and database anomalies from the relational table.
  2. Normalization helps to reduce redundancy and complexity by examining new data types used in the table.
  3. It is helpful to divide the large database table into smaller tables and link them using relationship.
  4. It avoids duplicate data or no repeating groups into a table.
  5. It reduces the chances for anomalies to occur in a database.

Types of Anomalies

Following are the types of anomalies that make the table inconsistency, loss of integrity, and redundant data.

1. Data redundancy occurs in a relational database when two or more rows or columns have the same value or repetitive value leading to unnecessary utilization of the memory.

Student Table:

StudRegistrationCourseIDStudNameAddressCourse2056204JamesLos AngelesEconomics2056247JamesLos AngelesEconomics2246247Trent BoltNew YorkMathematics2306204Ritchie RichEgyptComputer2306208Ritchie RichEgyptAccounts

There are two students in the above table, 'James' and 'Ritchie Rich', whose records are repetitive when we enter a new CourseID. Hence it repeats the studRegistration, StudName and address attributes.

2. Insert Anomaly: An insert anomaly occurs in the relational database when some attributes or data items are to be inserted into the database without existence of other attributes. For example, In the Student table, if we want to insert a new courseID, we need to wait until the student enrolled in a course. In this way, it is difficult to insert new record in the table. Hence, it is called insertion anomalies.

3. Update Anomalies: The anomaly occurs when duplicate data is updated only in one place and not in all instances. Hence, it makes our data or table inconsistent state. For example, suppose there is a student 'James' who belongs to Student table. If we want to update the course in the Student, we need to update the same in the course table; otherwise, the data can be inconsistent. And it reflects the changes in a table with updated values where some of them will not.

4. Delete Anomalies: An anomaly occurs in a database table when some records are lost or deleted from the database table due to the deletion of other records. For example, if we want to remove Trent Bolt from the Student table, it also removes his address, course and other details from the Student table. Therefore, we can say that deleting some attributes can remove other attributes of the database table.

So, we need to avoid these types of anomalies from the tables and maintain the integrity, accuracy of the database table. Therefore, we use the normalization concept in the database management system.

Types of Normalization

  1. First Normal Form (1NF)
  2. Second Normal Form (2NF)
  3. Third Normal Form (3NF)
  4. Boyce and Codd Normal Form (BCNF)
  5. Fourth Normal Form (4NF)
  6. Fifth Normal Form (5NF)

First Normal Form (1NF): The table will be in First Normal Form (1NF) if all the attributes of the table contain only atomic values. We can also say that if a table holds the multivalued data items in attributes or composite values, the relation cannot be in the first normal form. So, we need to make it first normal form by making the entries of the table atomic.

Second Normal Form (2NF): A Relation will be in 2NF if it follows the following condition:

  1. The table or relation should be in 1NF or First Normal Form.
  2. All the non-prime attributes should be fully functionally dependent on the candidate key.
  3. The table should not contain any partial dependency.

Third Normal Form (3NF): The table will be in Third Normal Form (3NF) if it follows the given conditions:

  1. The table or relation should be in 2NF.
  2. It should not contain any transitive dependency. A Transitive Dependency is that any non-prime attribute determines or depends on the other non-prime attribute.
    A relation is in 3NF if FD X determines Y ('X' -> 'Y') satisfies one of the following condition:
    1. If X -> Y is a trivial FD, i.e., Y is a subset of X.
    2. If X -> Y, where X is a Super key.
    3. If X -> Y, (Y - X) is a prime attribute.

Note: A table should be in a 3 NF, if the Left-Hand Side (LHS) of all Functional dependency (FD) must be a Candidate Key (CK) / Super Key, Or the Right-hand side should be Prime attribute.

BCNF: It stands for Boyce Codd Normal form, which is the next version of 3NF. Sometimes, it is also pronounced as 3.5 NF. A normal form is said to be in BCNF if it follows the given conditions:

  1. A table or relation must be in 3NF.
  2. If a relation R has functional dependencies (FD) and if A determines B, where A is a super Key, the relation is in BCNF.

Note: There should be one candidate/Super key on the left-hand side for each functional dependency for BCNF.

Fourth Normal Form (4 NF): A relation is said to be Fourth Normal Form (4NF) if it follows the given conditions:

  1. A table must be in BCNF.
  2. There should be no multivalued dependency in the table.

For example, if the dependency A -> B, for a single value of A, more than one value of B exists. Then the relation is said to be a multivalued dependency.

Following is the condition for being multivalued dependency:

  1. For a multivalued dependency A -> B, for a single value of A, there are multiple values of B.
  2. If a table has at least three columns, they have a multivalued dependency.
  3. For a relation with A, B and C columns, where B and C should be independent.

Fifth Normal Form (5 NF): A relation is said to be 5NF if it follows the given conditions:

  1. The table should be in 4NF.
  2. There should not be Join Dependency or further non-loss decomposed.

It is also known as Project Join Normal Form (PJNF).

Join dependency: A relation (R) is said to be a Join dependency if the relation (R) schema can be divided into smaller sets of tables R1, R2 … Rn that can be redesigned by joining multiple tables to the original table (R).

Does normalization reduce data redundancy?

Normalization is the process of reducing data redundancy in a table and improving data integrity.

What is data redundancy in normalization?

Data redundancy occurs when the same piece of data exists in multiple places, whereas data inconsistency is when the same data exists in different formats in multiple tables.

What is the goal of data normalization quizlet?

The objective of normalization is to isolate data so that additions, deletions and modifications of a field can be made in just on table and then retrieved through the rest of the database via defined relationships.

How can you reduce redundancy from the data?

How to Reduce Data Redundancy.
Applying master data. Master data is the sole source of common business data that a data administrator shares across different systems or applications. ... .
Normalizing databases. ... .
Remove unused data. ... .
Data integration..

Chủ đề