Description
Table: Person
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| id | int |
| email | varchar |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table contains an email. The emails will not contain uppercase letters.
Write a solution to report all the duplicate emails. Note that it’s guaranteed that the email field is not NULL.
Return the result table in any order.
The result format is in the following example.
Examples:
Example 1:
Input:
Person table:
+----+---------+
| id | email |
+----+---------+
| 1 | a@b.com |
| 2 | c@d.com |
| 3 | a@b.com |
+----+---------+
Output:
+---------+
| Email |
+---------+
| a@b.com |
+---------+
Explanation: a@b.com is repeated two times.
Solution in Pandas
Python
import pandas as pd
def duplicate_emails(person: pd.DataFrame) -> pd.DataFrame:
# Group by the 'email' column and count the occurrences of each email
email_counts = person['email'].value_counts()
# Filter the emails that appear more than once
duplicate_emails = email_counts[email_counts > 1].index.tolist()
# Convert the result to a DataFrame with the required format
result_df = pd.DataFrame(duplicate_emails, columns=['Email'])
return result_df
Explanation
- Counting occurrences:
person['email'].value_counts()
: This counts the occurrences of each email and returns a Series where the index is the email and the value is the count of that email.
- Filtering duplicates:
email_counts[email_counts > 1].index.tolist()
: This filters the emails that appear more than once and gets the list of these duplicate emails.
- Formatting the result:
pd.DataFrame(duplicate_emails, columns=['Email'])
: Converts the list of duplicate emails into a DataFrame with a single column namedEmail
.
This function will return a DataFrame containing the duplicate emails as specified in the problem statement.
Solution in MySQL
SQL
SELECT email AS Email
FROM Person
GROUP BY email
HAVING COUNT(email) > 1;
Solution in PostgreSQL
SQL
SELECT email AS Email
FROM Person
GROUP BY email
HAVING COUNT(email) > 1;