196. Delete Duplicate Emails – Leetcode Solutions

August 12, 2024

Table of Contents

Description

Table: Person

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| email       | varchar |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table contains an email. The emails will not contain uppercase letters.

Write a solution to delete all duplicate emails, keeping only one unique email with the smallest id.

For SQL users, please note that you are supposed to write a DELETE statement and not a SELECT one.

For Pandas users, please note that you are supposed to modify Person in place.

After running your script, the answer shown is the Person table. The driver will first compile and run your piece of code and then show the Person table. The final order of the Person table does not matter.

The result format is in the following example.

Examples:

Example 1:

Input: 
Person table:
+----+------------------+
| id | email            |
+----+------------------+
| 1  | john@example.com |
| 2  | bob@example.com  |
| 3  | john@example.com |
+----+------------------+
Output: 
+----+------------------+
| id | email            |
+----+------------------+
| 1  | john@example.com |
| 2  | bob@example.com  |
+----+------------------+

Solution in Pandas

Python

import pandas as pd

def delete_duplicate_emails(person: pd.DataFrame) -> None:
    # Step 1: Sort the DataFrame by 'id' in ascending order
    person.sort_values(by='id', inplace=True)
    
    # Step 2: Drop duplicates while keeping the first occurrence (smallest id) of each email
    person.drop_duplicates(subset='email', keep='first', inplace=True)

Explanation of the Code

sort_values:
- Before dropping duplicates, the person DataFrame is sorted by the id column in ascending order. This ensures that the row with the smallest id for each email comes first.
- The inplace=True parameter ensures that the sort is applied directly to the original DataFrame.
drop_duplicates:
- After sorting, the drop_duplicates method is used to remove duplicate emails. Since the DataFrame is sorted, the first occurrence corresponds to the row with the smallest id.

Solution in MySQL

SQL

-- Delete duplicate emails while keeping the row with the smallest id
DELETE p1
FROM Person p1
JOIN (
    -- Subquery to find the smallest id for each email
    SELECT MIN(Id) AS min_id, Email
    FROM Person
    GROUP BY Email
) p2 
ON p1.Email = p2.Email AND p1.Id > p2.min_id;

Solution in PostgreSQL

SQL

-- Delete duplicate emails while keeping the row with the smallest id
DELETE FROM Person
WHERE Id NOT IN (
    -- Subquery to find the smallest id for each email
    SELECT MIN(Id)
    FROM Person
    GROUP BY Email
);

196. Delete Duplicate Emails – Leetcode Solutions

Description

Solution in Pandas

Explanation of the Code

Solution in MySQL

Solution in PostgreSQL

Popular

123. Best Time to Buy and Sell Stock III – Leetcode Solutions

Search Insert Position (Array) – Solutions

150. Evaluate Reverse Polish Notation – Leetcode Solutions

110. Balanced Binary Tree – Leetcode Solutions

20. Valid Parentheses (String) – Leetcode Solutions

155. Min Stack – Leetcode Solutions

How to Activate Virtualenv in Python

196. Delete Duplicate Emails – Leetcode Solutions

Description

Solution in Pandas

Explanation of the Code

Solution in MySQL

Solution in PostgreSQL

Related Post

Popular