Analyzing Data with Pandas: Reading Data from a CSV File

June 6, 2024

Table of Contents

Reading a CSV File

To read a CSV file in Pandas, you use the read_csv function. This function reads a CSV file into a DataFrame. Here’s a step-by-step guide:

Import the Pandas library:

import pandas as pd

Read the CSV file:

Assuming you have a CSV file named data.csv, you can read it into a DataFrame as follows:

df = pd.read_csv('data.csv')

The read_csv function has many optional parameters that you can use to customize how the data is read. Here are some common parameters:

sep: Specifies the delimiter to use. Default is ,.
header: Row number to use as the column names. Default is 0 (first row).
names: List of column names to use.
index_col: Column(s) to set as index(MultiIndex).
usecols: Return a subset of the columns.
dtype: Data type for data or columns.
parse_dates: List of columns to parse as dates.

Example with Parameters

df = pd.read_csv('data.csv', sep=',', header=0, index_col=0, parse_dates=['date_column'])

Basic Data Analysis with Pandas

Once the data is loaded into a DataFrame, you can start analyzing it. Here are some common operations:

Viewing the Data

Display the first few rows:

print(df.head())

Display the last few rows:

print(df.tail())

Get the shape of the DataFrame:

print(df.shape)

Get a summary of the DataFrame:

print(df.info())

Statistical Summary

Get descriptive statistics:

print(df.describe())

Selecting Data

Select a single column:

column = df['column_name']

Select multiple columns:

subset = df[['column1', 'column2']]

Select rows by index:

row = df.loc[index]

Select rows by position:

row = df.iloc[position]

Filtering Data

Filter rows based on a condition:

filtered_df = df[df['column_name'] > value]

Handling Missing Data

Check for missing values:

print(df.isnull().sum())

Drop rows with missing values:

df.dropna(inplace=True)

Fill missing values:

df.fillna(value, inplace=True)

Saving Data

After manipulating the DataFrame, you might want to save it back to a CSV file:

df.to_csv('processed_data.csv', index=False)

Example Workflow

Here’s an example workflow that demonstrates reading a CSV file, performing some basic data analysis, and saving the results:

import pandas as pd

# Step 1: Read the CSV file
df = pd.read_csv('data.csv')

# Step 2: Display the first few rows
print(df.head())

# Step 3: Get a summary of the DataFrame
print(df.info())

# Step 4: Get descriptive statistics
print(df.describe())

# Step 5: Select specific columns
subset = df[['column1', 'column2']]

# Step 6: Filter rows based on a condition
filtered_df = df[df['column_name'] > value]

# Step 7: Handle missing data
df.fillna(0, inplace=True)

# Step 8: Save the processed data
df.to_csv('processed_data.csv', index=False)

Analyzing Data with Pandas: Reading Data from a CSV File

Reading a CSV File

Example with Parameters

Basic Data Analysis with Pandas

Viewing the Data

Statistical Summary

Selecting Data

Filtering Data

Handling Missing Data

Saving Data

Example Workflow

Popular

628. Maximum Product of Three Numbers – Leetcode Solutions

74. Search a 2D Matrix – Leetcode Solutions

331. Verify Preorder Serialization of a Binary Tree – Leetcode Solutions

344. Reverse String – Leetcode Solutions

374. Guess Number Higher or Lower – Leetcode Solutions

409. Longest Palindrome – Leetcode Solutions

278. First Bad Version – Leetcode Solutions

Analyzing Data with Pandas: Reading Data from a CSV File

Reading a CSV File

Example with Parameters

Basic Data Analysis with Pandas

Viewing the Data

Statistical Summary

Selecting Data

Filtering Data

Handling Missing Data

Saving Data

Example Workflow

Related Post

Popular