Description
Table: Employee
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| id | int |
| name | varchar |
| salary | int |
| departmentId | int |
+--------------+---------+
id is the primary key (column with unique values) for this table.
departmentId is a foreign key (reference column) of the ID from theDepartment
table.
Each row of this table indicates the ID, name, and salary of an employee. It also contains the ID of their department.
Table: Department
+-------------+---------+ | Column Name | Type | +-------------+---------+ | id | int | | name | varchar | +-------------+---------+ id is the primary key (column with unique values) for this table. Each row of this table indicates the ID of a department and its name.
A company’s executives are interested in seeing who earns the most money in each of the company’s departments. A high earner in a department is an employee who has a salary in the top three unique salaries for that department.
Write a solution to find the employees who are high earners in each of the departments.
Return the result table in any order.
The result format is in the following example.
Examples:
Example 1:
Input:
Employee table:
+----+-------+--------+--------------+
| id | name | salary | departmentId |
+----+-------+--------+--------------+
| 1 | Joe | 85000 | 1 |
| 2 | Henry | 80000 | 2 |
| 3 | Sam | 60000 | 2 |
| 4 | Max | 90000 | 1 |
| 5 | Janet | 69000 | 1 |
| 6 | Randy | 85000 | 1 |
| 7 | Will | 70000 | 1 |
+----+-------+--------+--------------+
Department table:
+----+-------+
| id | name |
+----+-------+
| 1 | IT |
| 2 | Sales |
+----+-------+
Output:
+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT | Max | 90000 |
| IT | Joe | 85000 |
| IT | Randy | 85000 |
| IT | Will | 70000 |
| Sales | Henry | 80000 |
| Sales | Sam | 60000 |
+------------+----------+--------+
Solution in Pandas
Python
import pandas as pd
def top_three_salaries(employee: pd.DataFrame, department: pd.DataFrame) -> pd.DataFrame:
# Step 1: Merge Employee and Department tables on departmentId and id
merged_df = pd.merge(employee, department, left_on='departmentId', right_on='id')
# Step 2: Sort by department and then by salary in descending order
merged_df = merged_df.sort_values(by=['departmentId', 'salary'], ascending=[True, False])
# Step 3: Identify top three unique salaries for each department
top_salaries = (
merged_df.groupby('departmentId')['salary']
.apply(lambda x: sorted(set(x), reverse=True)[:3])
.explode()
.reset_index()
)
# Step 4: Filter employees who have a salary in the top 3 unique salaries for their department
result_df = pd.merge(merged_df, top_salaries, on=['departmentId', 'salary'])
# Step 5: Select and rename the necessary columns
result_df = result_df[['name_y', 'name_x', 'salary']]
result_df.columns = ['Department', 'Employee', 'Salary']
return result_df
Explanation of the Code
- Merging Tables:
- The
employee
anddepartment
DataFrames are merged using a join ondepartmentId
andid
respectively, which allows us to access department names alongside employee details.
- The
- Sorting Data:
- The merged DataFrame is sorted by
departmentId
andsalary
in descending order to make it easier to extract the top salaries.
- The merged DataFrame is sorted by
- Identifying Top Salaries:
- We group by
departmentId
and use a lambda function to select the top 3 unique salaries. Theexplode
function is used to flatten the list of top salaries into rows.
- We group by
- Filtering:
- We use another
merge
operation to filter the employees who have salaries that match the top salaries in their respective departments.
- We use another
- Formatting Output:
- The final DataFrame is prepared by selecting and renaming the relevant columns.
Notes
- No MultiIndex: Instead of setting a
MultiIndex
, we directly merge the DataFrame containing the top salaries with the original merged DataFrame. This avoids issues with mismatched data types. - Exploding Lists: The
explode()
function is used to transform the grouped top salaries into a regular DataFrame format.
This approach should avoid the ValueError
and properly handle the data types, resulting in the correct output for your problem.
Solution in MySQL
SQL
WITH RankedSalaries AS (
SELECT
E.name AS Employee,
E.salary AS Salary,
D.name AS Department,
DENSE_RANK() OVER (PARTITION BY E.departmentId ORDER BY E.salary DESC) AS salary_rank
FROM
Employee E
JOIN
Department D
ON
E.departmentId = D.id
)
SELECT
Department,
Employee,
Salary
FROM
RankedSalaries
WHERE
salary_rank <= 3;
Solution in PostgreSQL
SQL
WITH RankedSalaries AS (
SELECT
E.name AS Employee,
E.salary AS Salary,
D.name AS Department,
DENSE_RANK() OVER (PARTITION BY E.departmentId ORDER BY E.salary DESC) AS salary_rank
FROM
Employee E
JOIN
Department D
ON
E.departmentId = D.id
)
SELECT
Department,
Employee,
Salary
FROM
RankedSalaries
WHERE
salary_rank <= 3;