Reading a CSV File
To read a CSV file in Pandas, you use the read_csv
function. This function reads a CSV file into a DataFrame. Here’s a step-by-step guide:
- Import the Pandas library:
import pandas as pd
- Read the CSV file:
Assuming you have a CSV file named data.csv
, you can read it into a DataFrame as follows:
df = pd.read_csv('data.csv')
The read_csv
function has many optional parameters that you can use to customize how the data is read. Here are some common parameters:
sep
: Specifies the delimiter to use. Default is,
.header
: Row number to use as the column names. Default is0
(first row).names
: List of column names to use.index_col
: Column(s) to set as index(MultiIndex).usecols
: Return a subset of the columns.dtype
: Data type for data or columns.parse_dates
: List of columns to parse as dates.
Example with Parameters
df = pd.read_csv('data.csv', sep=',', header=0, index_col=0, parse_dates=['date_column'])
Basic Data Analysis with Pandas
Once the data is loaded into a DataFrame, you can start analyzing it. Here are some common operations:
Viewing the Data
- Display the first few rows:
print(df.head())
- Display the last few rows:
print(df.tail())
- Get the shape of the DataFrame:
print(df.shape)
- Get a summary of the DataFrame:
print(df.info())
Statistical Summary
- Get descriptive statistics:
print(df.describe())
Selecting Data
- Select a single column:
column = df['column_name']
- Select multiple columns:
subset = df[['column1', 'column2']]
- Select rows by index:
row = df.loc[index]
- Select rows by position:
row = df.iloc[position]
Filtering Data
- Filter rows based on a condition:
filtered_df = df[df['column_name'] > value]
Handling Missing Data
- Check for missing values:
print(df.isnull().sum())
- Drop rows with missing values:
df.dropna(inplace=True)
- Fill missing values:
df.fillna(value, inplace=True)
Saving Data
After manipulating the DataFrame, you might want to save it back to a CSV file:
df.to_csv('processed_data.csv', index=False)
Example Workflow
Here’s an example workflow that demonstrates reading a CSV file, performing some basic data analysis, and saving the results:
import pandas as pd
# Step 1: Read the CSV file
df = pd.read_csv('data.csv')
# Step 2: Display the first few rows
print(df.head())
# Step 3: Get a summary of the DataFrame
print(df.info())
# Step 4: Get descriptive statistics
print(df.describe())
# Step 5: Select specific columns
subset = df[['column1', 'column2']]
# Step 6: Filter rows based on a condition
filtered_df = df[df['column_name'] > value]
# Step 7: Handle missing data
df.fillna(0, inplace=True)
# Step 8: Save the processed data
df.to_csv('processed_data.csv', index=False)