HomeLeetcode196. Delete Duplicate Emails - Leetcode Solutions

196. Delete Duplicate Emails – Leetcode Solutions

Description

Table: Person

+-------------+---------+
| Column Name | Type |
+-------------+---------+
| id | int |
| email | varchar |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table contains an email. The emails will not contain uppercase letters.

Write a solution to delete all duplicate emails, keeping only one unique email with the smallest id.

For SQL users, please note that you are supposed to write a DELETE statement and not a SELECT one.

For Pandas users, please note that you are supposed to modify Person in place.

After running your script, the answer shown is the Person table. The driver will first compile and run your piece of code and then show the Person table. The final order of the Person table does not matter.

The result format is in the following example.

Examples:

Example 1:

Input: 
Person table:
+----+------------------+
| id | email            |
+----+------------------+
| 1  | john@example.com |
| 2  | bob@example.com  |
| 3  | john@example.com |
+----+------------------+
Output: 
+----+------------------+
| id | email            |
+----+------------------+
| 1  | john@example.com |
| 2  | bob@example.com  |
+----+------------------+

Solution in Pandas

Python
import pandas as pd

def delete_duplicate_emails(person: pd.DataFrame) -> None:
    # Step 1: Sort the DataFrame by 'id' in ascending order
    person.sort_values(by='id', inplace=True)
    
    # Step 2: Drop duplicates while keeping the first occurrence (smallest id) of each email
    person.drop_duplicates(subset='email', keep='first', inplace=True)

Explanation of the Code

  1. sort_values:
    • Before dropping duplicates, the person DataFrame is sorted by the id column in ascending order. This ensures that the row with the smallest id for each email comes first.
    • The inplace=True parameter ensures that the sort is applied directly to the original DataFrame.
  2. drop_duplicates:
    • After sorting, the drop_duplicates method is used to remove duplicate emails. Since the DataFrame is sorted, the first occurrence corresponds to the row with the smallest id.

Solution in MySQL

SQL
-- Delete duplicate emails while keeping the row with the smallest id
DELETE p1
FROM Person p1
JOIN (
    -- Subquery to find the smallest id for each email
    SELECT MIN(Id) AS min_id, Email
    FROM Person
    GROUP BY Email
) p2 
ON p1.Email = p2.Email AND p1.Id > p2.min_id;

Solution in PostgreSQL

SQL
-- Delete duplicate emails while keeping the row with the smallest id
DELETE FROM Person
WHERE Id NOT IN (
    -- Subquery to find the smallest id for each email
    SELECT MIN(Id)
    FROM Person
    GROUP BY Email
);

Subscribe
Notify of

0 Comments
Inline Feedbacks
View all comments

Popular