Description
The cancellation rate is computed by dividing the number of canceled (by client or driver) requests with unbanned users by the total number of requests with unbanned users on that day.
Write a solution to find the cancellation rate of requests with unbanned users (both client and driver must not be banned) each day between "2013-10-01"
and "2013-10-03"
. Round Cancellation Rate
to two decimal points.
Return the result table in any order.
The result format is in the following example.
Examples:
Example 1:
Input:
Trips table:
+----+-----------+-----------+---------+---------------------+------------+
| id | client_id | driver_id | city_id | status | request_at |
+----+-----------+-----------+---------+---------------------+------------+
| 1 | 1 | 10 | 1 | completed | 2013-10-01 |
| 2 | 2 | 11 | 1 | cancelled_by_driver | 2013-10-01 |
| 3 | 3 | 12 | 6 | completed | 2013-10-01 |
| 4 | 4 | 13 | 6 | cancelled_by_client | 2013-10-01 |
| 5 | 1 | 10 | 1 | completed | 2013-10-02 |
| 6 | 2 | 11 | 6 | completed | 2013-10-02 |
| 7 | 3 | 12 | 6 | completed | 2013-10-02 |
| 8 | 2 | 12 | 12 | completed | 2013-10-03 |
| 9 | 3 | 10 | 12 | completed | 2013-10-03 |
| 10 | 4 | 13 | 12 | cancelled_by_driver | 2013-10-03 |
+----+-----------+-----------+---------+---------------------+------------+
Users table:
+----------+--------+--------+
| users_id | banned | role |
+----------+--------+--------+
| 1 | No | client |
| 2 | Yes | client |
| 3 | No | client |
| 4 | No | client |
| 10 | No | driver |
| 11 | No | driver |
| 12 | No | driver |
| 13 | No | driver |
+----------+--------+--------+
Output:
+------------+-------------------+
| Day | Cancellation Rate |
+------------+-------------------+
| 2013-10-01 | 0.33 |
| 2013-10-02 | 0.00 |
| 2013-10-03 | 0.50 |
+------------+-------------------+
Solution in Pandas
Python
import pandas as pd
def trips_and_users(trips: pd.DataFrame, users: pd.DataFrame) -> pd.DataFrame:
# Step 1: Merge Trips with Users to filter out trips with banned users.
# First, merge on client_id (clients)
trips_with_clients = trips.merge(users, left_on='client_id', right_on='users_id', suffixes=('', '_client'))
# Filter out trips where clients are banned
trips_with_clients = trips_with_clients[trips_with_clients['banned'] == 'No']
# Then merge the result with the Users table again, this time on driver_id (drivers)
trips_with_clients_and_drivers = trips_with_clients.merge(users, left_on='driver_id', right_on='users_id', suffixes=('', '_driver'))
# Filter out trips where drivers are banned
trips_with_clients_and_drivers = trips_with_clients_and_drivers[trips_with_clients_and_drivers['banned_driver'] == 'No']
# Step 2: Filter for trips between '2013-10-01' and '2013-10-03'
filtered_trips = trips_with_clients_and_drivers[
trips_with_clients_and_drivers['request_at'].between('2013-10-01', '2013-10-03')
]
# Step 3: Group by 'request_at' (day) and calculate the cancellation rate.
# A trip is considered canceled if the status is 'cancelled_by_driver' or 'cancelled_by_client'
# Calculate total trips per day and canceled trips per day
daily_stats = filtered_trips.groupby('request_at').agg(
total_trips=('id', 'count'),
canceled_trips=('status', lambda x: x.isin(['cancelled_by_driver', 'cancelled_by_client']).sum())
).reset_index()
# Step 4: Calculate the cancellation rate and round to 2 decimal places
daily_stats['Cancellation Rate'] = (daily_stats['canceled_trips'] / daily_stats['total_trips']).round(2)
# Step 5: Select relevant columns for the final output
result = daily_stats[['request_at', 'Cancellation Rate']].rename(columns={'request_at': 'Day'})
return result
Explanation:
- Merging with
Users
table:- The
Trips
table is first merged with theUsers
table based on theclient_id
, and trips where the client is banned are filtered out. - The resulting table is then merged again, but this time on
driver_id
to include driver information and filter out trips where the driver is banned.
- The
- Filtering the relevant date range:
- After obtaining the trips with unbanned users, the trips are filtered to only include those between the specified dates (“2013-10-01” to “2013-10-03”).
- Calculating daily statistics:
- The code groups the trips by
request_at
to calculate the total number of trips and the number of canceled trips for each day. - A custom lambda function checks if a trip’s status is either
'cancelled_by_driver'
or'cancelled_by_client'
.
- The code groups the trips by
- Computing and rounding the cancellation rate:
- The cancellation rate is computed by dividing the number of canceled trips by the total number of trips for each day, and the result is rounded to two decimal places.
- Formatting the output:
- The final result contains only two columns:
Day
andCancellation Rate
.
- The final result contains only two columns:
Solution in MySQL
SQL
WITH Unbanned_Trips AS (
-- Join the Trips table with the Users table twice to ensure both client and driver are unbanned
SELECT t.id, t.request_at, t.status
FROM Trips t
JOIN Users u_client ON t.client_id = u_client.users_id
JOIN Users u_driver ON t.driver_id = u_driver.users_id
-- Filter out trips where either the client or the driver is banned
WHERE u_client.banned = 'No' AND u_driver.banned = 'No'
)
-- Now calculate the cancellation rate by day
SELECT
request_at AS Day,
ROUND(SUM(CASE WHEN status IN ('cancelled_by_driver', 'cancelled_by_client') THEN 1 ELSE 0 END) / COUNT(*), 2) AS `Cancellation Rate`
FROM Unbanned_Trips
-- Filter by the given date range
WHERE request_at BETWEEN '2013-10-01' AND '2013-10-03'
-- Group by day
GROUP BY request_at
-- Order the result by date in ascending order
ORDER BY request_at ASC;
Solution in PostgreSQL
SQL
-- First, we select only the trips where both the client and the driver are not banned.
WITH Unbanned_Trips AS (
-- Join the Trips table with the Users table twice:
-- Once to check the client (client_id) and once for the driver (driver_id)
SELECT t.id, t.request_at, t.status
FROM Trips t
JOIN Users u_client ON t.client_id = u_client.users_id
JOIN Users u_driver ON t.driver_id = u_driver.users_id
-- Filter: both client and driver must not be banned
WHERE u_client.banned = 'No' AND u_driver.banned = 'No'
)
-- Now, we calculate the cancellation rate for unbanned users for each day
SELECT
-- The day of the request
request_at AS Day,
-- Calculate the cancellation rate:
-- Count the number of cancelled trips (either by client or driver) and divide it by the total trips on that day.
-- Round the result to two decimal places for proper formatting.
ROUND(
-- Sum the canceled trips (status is either 'cancelled_by_driver' or 'cancelled_by_client')
SUM(CASE
WHEN status IN ('cancelled_by_driver', 'cancelled_by_client') THEN 1
ELSE 0
END)::numeric / COUNT(*), -- Divide by total trips
2
) AS "Cancellation Rate" -- Alias the column as "Cancellation Rate"
-- Specify the table from which to get the data (our CTE of unbanned trips)
FROM Unbanned_Trips
-- Filter the dates to the range "2013-10-01" to "2013-10-03"
WHERE request_at BETWEEN '2013-10-01' AND '2013-10-03'
-- Group the results by the day (request_at) to calculate for each day separately
GROUP BY request_at
-- Order the results by date in ascending order to match the expected output
ORDER BY request_at ASC;