DBMS Concepts

DBMS Tutorial Components of DBMS. Applications of DBMS The difference between file system and DBMS. Types of DBMS DBMS Architecture DBMS Schema Three Schema Architecture. DBMS Languages.

DBMS ER Model

ER model: Entity Relationship Diagram (ERD) Components of ER Model. DBMS Generalization, Specialization and Aggregation.

DBMS Relational Model

Codd’s rule of DBMS Relational DBMS concepts Relational Integrity Constraints DBMS keys Convert ER model into Relational model Difference between DBMS and RDBMS Relational Algebra DBMS Joins

DBMS Normalization

Functional Dependency Inference Rules Multivalued Dependency Normalization in DBMS: 1NF, 2NF, 3NF, BCNF and 4NF

DBMS Transaction

What is Transaction? States of transaction ACID Properties in DBMS Concurrent execution and its problems DBMS schedule DBMS Serializability Conflict Serializability View Serializability Deadlock in DBMS Concurrency control Protocols

Difference

Difference between DFD and ERD

Misc

Advantages of DBMS Disadvantages of DBMS Data Models in DBMS Relational Algebra in DBMS Cardinality in DBMS Entity in DBMS Attributes in DBMS Data Independence in DBMS Primary Key in DBMS Foreign Key in DBMS Candidate Key in DBMS Super Key in DBMS Aggregation in DBMS Hashing in DBMS Generalization in DBMS Specialization in DBMS View in DBMS File Organization in DBMS What Is A Cloud Database What Is A Database Levels Of Locking In DBMS What is RDBMS Fragmentation in Distributed DBMS What is Advanced Database Management System Data Abstraction in DBMS Checkpoint In DBMS B Tree in DBMS BCNF in DBMS Advantages of Threaded Binary Tree in DBMS Advantages of Database Management System in DBMS Enforcing Integrity Constraints in DBMS B-Tree Insertion in DBMS B+ Tree in DBMS Advantages of B-Tree in DBMS Types of Data Abstraction in DBMS Levels of Abstraction in DBMS 3- Tier Architecture in DBMS Anomalies in Database Management System Atomicity in Database Management System Characteristics of DBMS DBMS Examples Difference between Relational and Non-Relational Databases Domain Constraints in DBMS Entity and Entity set in DBMS ER Diagram for Banking System in DBMS ER Diagram for Company Database in DBMS ER Diagram for School Management System in DBMS ER Diagram for Student Management System in DBMS ER Diagram for University Database in DBMS ER Diagram of Company Database in DBMS Er Diagram Symbols and Notations in DBMS How to draw ER-Diagram in DBMS Integrity Constraints in DBMS Red-Black Tree Deletion in DBMS Red-Black Tree Properties in DBMS Red-Black Tree Visualization in DBMS Redundancy in Database Management System Secondary Key in DBMS Structure of DBMS 2-Tier Architecture in DBMS Advantages and Disadvantages of Binary Search Tree Closure of Functional Dependency in DBMS Consistency in Database Management System Durability in Database Management System ER Diagram for Bank Management System in DBMS ER Diagram for College Management System in DBMS ER Diagram for Hotel Management System in DBMS ER Diagram for Online Shopping ER Diagram for Railway Reservation System ER Diagram for Student Management System in DBMS Isolation in DBMS Lossless Join and Dependency Preserving Decomposition in DBMS Non-Key Attributes in DBMS Data Security Requirements in DBMS DBMS functions and Components What is Homogeneous Database? DBMS Functions and Components Advantages and Disadvantages of Distributed Database

Normalization in DBMS: 1NF, 2NF, 3NF, BCNF & 4NF with Examples

Normalization is a technique of organizing the data in the database. It is a systematic approach which is used to remove or reduce data redundancy in the tables and remove the insert, update, and delete anomalies. It mainly divides the larger table into smaller tables and links them using a relationship to increase the clarity of data. Normalization was introduced by IBM researcher E.F. Codd in 1970s.

Anomalies in DBMS

Following are the three types of anomalies that occur when the database is not normalized:

  1. Insertion Anomaly
  2. Updation Anomaly
  3. Deletion Anomaly

To understand these anomalies, let’s take an example:

Below table University consists of seven attributes: Sid, Sname, Cid, Cname, Fid, Fname, and Salary. And the Sid acts as a key attribute or a primary key in the relation.

Table: University

Sid Sname Cid Cname Fid Fname Salary
1 Ram C1 DBMS F1 Sachin 30000
2 Shyam C2 Java F2 Boby 28000
3 Ankit C1 DBMS F1 Sachin 30000
4 saurabh C1 DBMS F1 Sachin 30000

1. Insertion Anomaly

Suppose a new faculty joins the University, and the Database Administrator inserts the faculty data into the above table. But he is not able to insert because Sid is a primary key, and can’t be NULL. So this type of anomaly is known as an insertion anomaly.

Table: University

What is Normalization

2. Delete Anomaly

When the Database Administrator wants to delete the student details of Sid=2 from the above table, then it will delete the faculty and course information too which cannot be recovered further.

What is Normalization 1

3. Update Anomaly

When the Database Administrator wants to change the salary of faculty F1 from 30000 to 40000 in above table University, then the database will update salary in more than one row due to data redundancy. So, this is an update anomaly in a table.

What is Normalization 2

To remove all these anomalies, we need to normalize the data in the database.

Normal forms

Database Normalization is divided into the following Normal forms:

  1. First Normal Form (1NF)
  2. Second  Normal Form (2NF)
  3. Third  Normal Form (3NF)
  4. Boyce-Codd Normal Form (3.5NF/BCNF)
  5. Fourth Normal Form (4NF)

First Normal Form (1NF)

According to the E.F. Codd, a relation will be in 1NF, if each cell of a relation contains only an atomic value. This normal form states that an attribute of a relation cannot hold multiple values.  It should hold only single-valued attributes.  Values stored in an attribute should be of the same domain.

Example:

The following student relation is not in 1NF because the Subject attribute contains multiple values.

Student_id Name Subject
101 Akash Computer Network, JAVA
102 Vikrant Database Management System
103 Amrita Software Engineering, Compiler Design

The below relation student is in 1NF:

Student_id Name Subjects
101 Akshay Computer Network
101 Akshay JAVA
102 Aman Database Management System
103 Anjali Software Engineering
103 Anjali Compiler Design

                                             Relation is in 1NF

Second Normal Form (2NF)

According to the E.F. Codd, a relation is in 2NF, if it satisfies the following conditions:

  • A relation must be in 1NF.
  • And the candidate key in a relation should determine all non-prime attributes or no partial dependency should exist in the relation. 

Prime attributes: The attributes which are used to form a candidate key are called prime attributes.

Non-Prime attributes: The attributes which do not form a candidate key are called non-prime attributes.

Partial Dependency: If a non-prime attribute can be determined by the part of the candidate key in a relation, it is known as a partial dependency. Or we can say that, if L.H.S is the proper subset of a candidate key and R.H.S is the non-prime attribute, then it shows a partial dependency.

Example of partial Dependency: Suppose there is a relation R with attributes A, B, and C.

What is Normalization 3

Example of Second normal form:

Example: Suppose a training institute wants to store the data of student and the programming_languages they learn. Since a student can learn more than one programming_language, the relation can have multiple rows for a same student. Following relation shows the data of the students:

student_id programming_langauges student_age
101 Computer Network 20
101 JAVA 20
102 Database Management System 20
103 Software Engineering 21
103 Compiler Design 21

Candidate Keys: {student_id, programming_language}
Non-prime attribute: student_age

The above relation is in 1 NF because each attribute contains atomic values. However, it is not in 2NF because a non-prime attribute student_age is dependent on student_id, which is a proper subset of a candidate key.
This violates the rule for second normal form as a rule says “no non-prime attribute should be dependent on the part of a candidate key of the relation”.
To make the relation in 2NF, we can break it in two tables like:
Student_details table:

student_id student_age
101 20
101 20
102 20
103 21
103 21

student_programminglangauge table:

student_id programming_langauge
101 Computer Network
101 JAVA
102 Database Management System
103 Software Engineering
103 Compiler Design

Now, both the tables follow 2NF.

Third Normal Form (3NF)

According to the E.F. Codd, a relation is in third normal form (3NF) if it satisfies the following conditions:

  • A relation must be in second normal form (2NF).
  • And there should be no transitive functional dependency exists for non-prime attributes in a relation.

Third Normal Form is used to achieve data integrity and reduce the duplication of data.

A relation is in 3NF if and only if any one of the following condition will satisfy for each non-trivial functional dependency X? Y:

  1. X is a super key or candidate key
  2. And, Y is a prime attribute, i.e., Y is a part of candidate key.

Transitive Dependency: If X ? Y and Y? Z are two functional dependencies, X ? Z is called as a transitive functional dependency.

Example of 3NF:
Suppose a school wants to store the address of each student, they create a table named student_details that looks like:

Rollno State City
1 Punjab Chandigarh
2 Haryana Ambala
3 Punjab Chandigarh
4 Haryana Ambala
5 Uttar Pradesh Ghaziabad

Candidate Key: {Rollno}
Prime attribute: Rollno
Non-prime attribute: {State, City}
The above relation is not in third normal form, because as a rule says, there should be no transitive functional dependency in the relation.
Here, City (a non-prime attribute) depends on State (a non-prime attribute), and State depends on Rollno. The non-prime attributes (State, City) are transitively dependent on the candidate key(Rollno). Thus, it violates the rule of third normal form.
To covert the relation in 3NF, you have to decompose the relation as:

Table: Student_state

Rollno State
1 Punjab
2 Haryana
3 Punjab
4 Haryana
5 Uttar Pradesh

Table:Student_city

State City
Punjab Chandigarh
Haryana Ambala
Uttar Pradesh Ghaziabad

Now, both the tables follow the third normal form (3NF).

Boyce-Codd Normal Form (BCNF)

Boyce-Codd Normal Form (BCNF) is the advance version of the third normal form (3NF) that’s why it is also known as a 3.5NF.
According to the E.F. Codd, a relation is in Boyce-Codd normal form (3NF) if it satisfies the following conditions:

  • A relation is in 3NF.
  • And, for every functional dependency, X ? Y, L.H.S of the functional dependency (X) be the super key of the table. 

Example of BCNF:
Suppose there is a college where one faculty teach in more than one department. They create a table like:

F_id F_address Course_id Course_name
101 Delhi  C1 MCA
101 Delhi C2 MBA
102 Noida C1 MCA
102 Noida C2 MBA

In the above relation functional dependencies are as follows:

F_id ? F_address

Course_id ? Course_name

Candidate key: {Fid, Course_id}

Above relation is not in BCNF as neither F_id nor Course_id alone are keys.
To make the relation in BCNF, we can break the table into three parts like this:

Facult_address

F_id F_address
101 Delhi
102 Noida

Course

Course_id Course_name
 C1 MCA
C2 MBA

Faculty_Course

F_id Course_id
101  C1
101 C2
102 C1
102 C2

Fourth Normal Form

According to the E.F. Codd, a relation is in fourth normal form (4NF) if it satisfies the following conditions:

  • A relation is in BCNF.
  • And, there is no multivalued dependency exists in the relation.

Multivalued dependency: For a dependency X ? Y, if for a single value of X, multiple values of Y exists, then the relation may have a multi-valued dependency. It is represented by the double arrow sign (??).

A relation with multivalued dependencies violates the fourth normal form (4NF), because it creates unnecessary redundancy of data.

Example:

The relation student consists of three attributes: student_id, Name, and Course.

student_id Name Course
101 Ankit Python
102 Kartikey Java
103 Krishna R programming
101 Ankit JAVA
105 Akash PHP

In the above relation, Name and Course are two independent attributes and both are dependent on student_id.
In this case, these two attributes can be called as multivalued dependent on student_id. Following are the representation of these dependencies:

Student_id ?? Name
Student_id ?? Course

So, to make the above relation into the fourth normal form (4NF), decompose it into two tables:

Student_name

student_id Name
101 Ankit
102 Kartikey
103 Krishna
105 Akash

Student_course

student_id Course
101 Python
102 Java
103 R programming
101 JAVA
105 PHP

Related Post: