SQL Tutorial

SQL Tutorial SQL Introduction SQL Syntax SQL Data Types SQL OPERATORS SQL COMMANDS SQL Queries

SQL Database

SQL Create Database SQL DROP Database SQL SELECT Database

SQL Table

SQL TABLE SQL CREATE TABLE SQL COPY TABLE SQL ALTER TABLE SQL DELETE SQL TRUNCATE TABLE SQL DROP TABLE SQL UPDATE TABLE SQL INSERT TABLE

SQL SELECT

SQL SELECT Statement SQL SELECT WHERE Clause SQL SELECT IN Operator SQL BETWEEN Operator SQL SELECT BETWEEN Operator SQL SELECT AND Operator SQL SELECT OR Operator SQL SELECT LIKE Operator SQL SELECT DISTINCT SQL SELECT SUM SQL SELECT MAX SQL SELECT MIN SQL SELECT AVG

SQL Clause

SQL WHERE Clause SQL GROUP BY CLAUSE SQL ORDER BY Clause SQL HAVING Clause

SQL INSERT

SQL INSERT Statement SQL INSERT INTO Statement SQL INSERT INTO Values SQL INSERT INTO SELECT SQL Insert multiple rows

SQL JOIN

SQL JOIN SQL Inner Join SQL Left Join SQL Right Join SQL Full Join SQL CROSS Join

SQL OPERATOR

SQL Comparison SQL LOGICAL Operator SQL Cast Operator SQL Arithmetic

Difference

SQL vs NOSQL WHERE vs HAVING DELETE vs DROP GROUP BY vs ORDER BY DROP vs TRUNCATE SQL IN vs SQL EXISTS Difference between Delete, Drop and Truncate in SQL

MISC

SQL SubQuery SQL CASE Commit and Rollback in SQL Pattern Matching in SQL DDL Commands in SQL DML Commands in SQL Types of SQL Commands SQL COUNT SQL Primary Key SQL FOREIGN KEY SET Operators in SQL Check Constraint in SQL SQL EXCEPT SQL VIEW SQL WHERE Statement SQL CRUD Operation Where Condition in SQL TCL Commands in SQL Types of SQL JOINS SQL Nth Highest Salary SQL NOT OPERATOR SQL UNION ALL SQL INTERSECT SQL Data Definition Language SQL Data Manipulation Language SQL Data Control Language SQL CONSTRAINTS SQL Aggregate Operators SQL KEYS Codd’s Rules in SQL What is SQL Injection? Trigger In SQL SQL WHERE Multiple Conditions Truncate function in SQL SQL Formatter WEB SQL SQL Auto Increment Save Point in SQL space() function in SQL SQL Aggregate Functions SQL Topological Sorting SQL Injection SQL Cloning Tables SQL Aliases SQL Handling Duplicate Update Query in SQL Grant Command in SQL SQL SET Keyword SQL Order BY LIMIT SQL Order BY RANDOM

How To

How to use the BETWEEN operator in SQL How To Use INNER JOIN In SQL How to use LIKE in SQL How to use HAVING Clause in SQL How to use GROUP BY Clause in SQL How To Remove Duplicates In SQL How To Delete A Row In SQL How to add column in table in SQL ? How to drop a column in SQL? How to create a database in SQL? How to use COUNT in SQL? How to Create Temporary Table in SQL? How to Add Foreign Key in SQL? How to Add Comments in SQL? How To Use Group By Clause In SQL How To Use Having Clause In SQL How To Delete Column In Table How To Compare Date In SQL How index works in SQL How to calculate age from Date of Birth in SQL How to Rename Column name in SQL what-are-single-row-and-multiple-row-subqueries">What are single row and multiple row subqueries?

How to remove duplicates in SQL

Introduction

  • There are some specific rules that needs to be followed while creating the database objects. To improve the performance of a database, a primary key, clustered and non-clustered indexes, and constraints should be assigned to a table. Though we follow all these rules, duplicate rows may still occur in a table.
  • It is always a good practice to make use of the database keys. Using the database keys will reduce the chances of getting duplicate records in a table. But if duplicate records are already present in a table, there are specific ways that are used to remove these duplicate records.

Ways to remove Duplicate Rows

  • Use of DELETE JOIN statement to remove duplicate rows

DELETE JOIN statement is provided in MySQL that helps to remove duplicate rows from a table.

Consider a database with the name "studentdb”. We will create a table student into it.

mysql> USE studentdb;
Database changed
mysql> CREATE TABLE student (Stud_ID INT, Stud_Name VARCHAR(20), Stud_City VARCHAR(20), Stud_email VARCHAR(255), Stud_Age INT);
Query OK, 0 rows affected (0.15 sec)
How to remove duplicates in SQL

We have successfully created a 'student' table in the 'studentdb' database.

Now, we will write the following queries to insert data in the student table.

mysql> INSERT INTO student VALUES (1, "Ankit", "Nagpur", "[email protected]", 32);
Query OK, 1 row affected (0.08 sec)


mysql> INSERT INTO student VALUES (2, "Soham", "Nanded", "[email protected]", 35);
Query OK, 1 row affected (0.08 sec)


mysql> INSERT INTO student VALUES (3, "Soham", "Nanded", "[email protected]", 26);
Query OK, 1 row affected (0.04 sec)


mysql> INSERT INTO student VALUES (4, "Ravi", "Chandigarh", "[email protected]", 19);
Query OK, 1 row affected (0.09 sec)


mysql> INSERT INTO student VALUES (5, "Ravi", "Chandigarh", "[email protected]", 19);
Query OK, 1 row affected (0.09 sec)


mysql> INSERT INTO student VALUES (6, "Shyam", "Dehradun", "[email protected]", 22);
Query OK, 1 row affected (0.09 sec)


mysql> INSERT INTO student VALUES (7, "Manthan", "Ambala", "[email protected]", 24);
Query OK, 1 row affected (0.08 sec)


mysql> INSERT INTO student VALUES (8, "Neeraj", "Noida", "[email protected]", 25);
Query OK, 1 row affected (0.04 sec)


mysql> INSERT INTO student VALUES (9, "Anand", "Kashmir", "[email protected]", 20);
Query OK, 1 row affected (0.07 sec)


mysql> INSERT INTO student VALUES (10, "Raju", "Shimla", "[email protected]", 29);
Query OK, 1 row affected (0.13 sec)


mysql> INSERT INTO student VALUES (11, "Raju", "Shimla", "[email protected]", 29);
Query OK, 1 row affected (0.08 sec)
How to remove duplicates in SQL

Now, we will retrieve all the records from the student table. We will consider this table and database for all the following examples.

mysql> SELECT *FROM student;
+---------+-----------+------------+-------------------------+----------+
| Stud_ID | Stud_Name | Stud_City  | Stud_email              | Stud_Age |
+---------+-----------+------------+-------------------------+----------+
|       1 | Ankit     | Nagpur     | [email protected]       |       32 |
|       2 | Soham     | Nanded     | [email protected]       |       35 |
|       3 | Soham     | Nanded     | [email protected]       |       26 |
|       4 | Ravi      | Chandigarh | [email protected]         |       19 |
|       5 | Ravi      | Chandigarh | [email protected]         |       19 |
|       6 | Shyam     | Dehradun   | [email protected]     |       22 |
|       7 | Manthan   | Ambala     | [email protected] |       24 |
|       8 | Neeraj    | Noida      | [email protected]   |       25 |
|       9 | Anand     | Kashmir    | [email protected]        |       20 |
|      10 | Raju      | Shimla     | [email protected]         |       29 |
|      11 | Raju      | Shimla     | [email protected]         |       29 |
+---------+-----------+------------+-------------------------+----------+
11 rows in set (0.00 sec)
How to remove duplicates in SQL

Example 1:

Write a query to delete duplicate rows from the student table using the DELETE JOIN statement.

mysql> DELETE s1 FROM student s1 INNER JOIN student s2 WHERE s1.Stud_ID < s2.Stud_ID AND s1.Stud_email = s2.Stud_email;

We have used the DELETE query with INNER JOIN. To implement the INNER JOIN on a single table, we have created two instances s1 and s2. Then, with the help of WHERE clause, we have checked two conditions to find out the duplicate rows in the student table. If the email id in two different records is the same and the student id is different, it will be treated as a duplicate record according to the WHERE clause condition.

Output:

Query OK, 3 rows affected (0.20 sec)

The results of the above query show that there are three duplicate records present in the student table.

How to remove duplicates in SQL

We will use the SELECT query to find the duplicate records which were deleted.

mysql> SELECT *FROM student;
+---------+-----------+------------+-------------------------+----------+
| Stud_ID | Stud_Name | Stud_City  | Stud_email              | Stud_Age |
+---------+-----------+------------+-------------------------+----------+
|       1 | Ankit     | Nagpur     | [email protected]       |       32 |
|       3 | Soham     | Nanded     | [email protected]       |       26 |
|       5 | Ravi      | Chandigarh | [email protected]         |       19 |
|       6 | Shyam     | Dehradun   | [email protected]     |       22 |
|       7 | Manthan   | Ambala     | [email protected] |       24 |
|       8 | Neeraj    | Noida      | [email protected]   |       25 |
|       9 | Anand     | Kashmir    | [email protected]        |       20 |
|      11 | Raju      | Shimla     | [email protected]         |       29 |
+---------+-----------+------------+-------------------------+----------+
8 rows in set (0.00 sec)
How to remove duplicates in SQL

Now, there are only 8 records that are present in the student table as the three duplicate records are deleted from the currently selected table. According to the following condition:

s1.Stud_ID < s2.Stud_ID AND s1.Stud_email = s2.Stud_email;

If the email ids of any two records are the same, then since the less than sign is used between the student id, only the record with greater employee IDs will be kept, and the other duplicate record will be deleted between the two records.

Example 2:

Write a query to delete duplicate rows from the student table using the delete join statement while keeping the duplicate record with a lesser employee id and deleting the other one.

mysql> DELETE s1 FROM student s1 INNER JOIN student s2 WHERE s1.Stud_ID > s2.Stud_ID AND s1.Stud_email = s2.Stud_email;

We have used the DELETE query with INNER JOIN. To implement the INNER JOIN on a single table, we have created two instances s1 and s2. Then, with the help of the WHERE clause, we have checked two conditions to find out the duplicates rows in the student table. If the email id present in two different records is the same and the student id is different, it will be treated as a duplicate record according to the WHERE clause condition.

Output:

Query OK, 3 rows affected (0.09 sec)

The results of the above query show that there are three duplicate records present in the student table.

How to remove duplicates in SQL

We will use the SELECT query to find the duplicate records which were deleted.

mysql> SELECT *FROM student;
+---------+-----------+------------+-------------------------+----------+
| Stud_ID | Stud_Name | Stud_City  | Stud_email              | Stud_Age |
+---------+-----------+------------+-------------------------+----------+
|       1 | Ankit     | Nagpur     | [email protected]       |       32 |
|       2 | Soham     | Nanded     | [email protected]       |       35 |
|       4 | Ravi      | Chandigarh | [email protected]         |       19 |
|       6 | Shyam     | Dehradun   | [email protected]     |       22 |
|       7 | Manthan   | Ambala     | [email protected] |       24 |
|       8 | Neeraj    | Noida      | [email protected]   |       25 |
|       9 | Anand     | Kashmir    | [email protected]        |       20 |
|      10 | Raju      | Shimla     | [email protected]         |       29 |
+---------+-----------+------------+-------------------------+----------+
8 rows in set (0.00 sec)
How to remove duplicates in SQL

Now, there are only 8 records that are present in the student table as the three duplicate records are deleted from the currently selected table. According to the following condition:

s1.Stud_ID > s2.Stud_ID AND s1.Stud_email = s2.Stud_email;

If the email ids of any two records are the same since the greater than sign is used between the student id, only the record with the lesser employee id will be kept, and the other duplicate record will be deleted among the two records.

  •  Use of an intermediate table to remove duplicate rows

The following steps should be followed while removing the duplicate rows with the help of an intermediate table.

  1. A new table should be created, which will be the same as the actual table.
  2. Add distinct rows from the actual table to the newly created table.
  3. Drop the actual table and rename the new table with the same name as an actual table.

Example:

Write a query to delete the duplicate records from the student table by using an intermediate table.

Step 1:

Firstly, we will create an intermediate table that will be the same as the employee table.

mysql> CREATE TABLE temp_student LIKE student;
Query OK, 0 rows affected (0.14 sec)
How to remove duplicates in SQL

Here, ‘employee’ is the original table and ‘temp_student’ is the intermediate table.

Step 2:

Now, we will fetch only the unique records from the student table and insert all the fetched records into the temp_student table.

mysql> INSERT INTO temp_student SELECT *FROM student GROUP BY Stud_email;
Query OK, 8 rows affected (0.12 sec)
Records: 8  Duplicates: 0  Warnings: 0
How to remove duplicates in SQL

Here, before inserting the distinct records from the student table into temp_student, all the duplicate records are filtered by Stud_email. Then, only the records with unique email id have inserted into temp_student.

Step 3:

Then, we will remove the student table and rename the table  temp_student to the student table.

mysql> DROP TABLE student;
Query OK, 0 rows affected (0.08 sec)
mysql> ALTER TABLE temp_student RENAME TO student;
Query OK, 0 rows affected (0.08 sec)
How to remove duplicates in SQL

The student table is removed successfully, and temp_student is renamed to the student table, which contains only the unique records.

Then, we need to verify that the student table now contains only the unique records. To verify this, we have used the SELECT query to see the data contained in the student table.

mysql> SELECT *FROM student;

Output:

+---------+-----------+------------+-------------------------+----------+
| Stud_ID | Stud_Name | Stud_City  | Stud_email              | Stud_Age |
+---------+-----------+------------+-------------------------+----------+
|       9 | Anand     | Kashmir    | [email protected]        |       20 |
|       1 | Ankit     | Nagpur     | [email protected]       |       32 |
|       7 | Manthan   | Ambala     | [email protected] |       24 |
|       8 | Neeraj    | Noida      | [email protected]   |       25 |
|      10 | Raju      | Shimla     | [email protected]         |       29 |
|       4 | Ravi      | Chandigarh | [email protected]         |       19 |
|       6 | Shyam     | Dehradun   | [email protected]     |       22 |
|       2 | Soham     | Nanded     | [email protected]       |       35 |
+---------+-----------+------------+-------------------------+----------+
8 rows in set (0.00 sec)
How to remove duplicates in SQL

Now, there are only 8 records that are present in the student table as the three duplicate records are deleted from the currently selected table. In step 2, while fetching the distinct records from the original table and inserting them into an intermediate table, a GROUP BY clause was used on Stud_email, so all the records were inserted based on the email ids of students. Here, only the record with a lower employee id is kept among the duplicate records by default, and the other one is deleted.