HomepythonAnalyzing Data with Pandas: Reading Data from a CSV File

Analyzing Data with Pandas: Reading Data from a CSV File

Reading a CSV File

To read a CSV file in Pandas, you use the read_csv function. This function reads a CSV file into a DataFrame. Here’s a step-by-step guide:

  1. Import the Pandas library:
import pandas as pd
  1. Read the CSV file:

Assuming you have a CSV file named data.csv, you can read it into a DataFrame as follows:

df = pd.read_csv('data.csv')

The read_csv function has many optional parameters that you can use to customize how the data is read. Here are some common parameters:

  • sep: Specifies the delimiter to use. Default is ,.
  • header: Row number to use as the column names. Default is 0 (first row).
  • names: List of column names to use.
  • index_col: Column(s) to set as index(MultiIndex).
  • usecols: Return a subset of the columns.
  • dtype: Data type for data or columns.
  • parse_dates: List of columns to parse as dates.

Example with Parameters

df = pd.read_csv('data.csv', sep=',', header=0, index_col=0, parse_dates=['date_column'])

Basic Data Analysis with Pandas

Once the data is loaded into a DataFrame, you can start analyzing it. Here are some common operations:

Viewing the Data

  • Display the first few rows:
print(df.head())
  • Display the last few rows:
print(df.tail())
  • Get the shape of the DataFrame:
print(df.shape)
  • Get a summary of the DataFrame:
print(df.info())

Statistical Summary

  • Get descriptive statistics:
print(df.describe())

Selecting Data

  • Select a single column:
column = df['column_name']
  • Select multiple columns:
subset = df[['column1', 'column2']]
  • Select rows by index:
row = df.loc[index]
  • Select rows by position:
row = df.iloc[position]

Filtering Data

  • Filter rows based on a condition:
filtered_df = df[df['column_name'] > value]

Handling Missing Data

  • Check for missing values:
print(df.isnull().sum())
  • Drop rows with missing values:
df.dropna(inplace=True)
  • Fill missing values:
df.fillna(value, inplace=True)

Saving Data

After manipulating the DataFrame, you might want to save it back to a CSV file:

df.to_csv('processed_data.csv', index=False)

Example Workflow

Here’s an example workflow that demonstrates reading a CSV file, performing some basic data analysis, and saving the results:

import pandas as pd

# Step 1: Read the CSV file
df = pd.read_csv('data.csv')

# Step 2: Display the first few rows
print(df.head())

# Step 3: Get a summary of the DataFrame
print(df.info())

# Step 4: Get descriptive statistics
print(df.describe())

# Step 5: Select specific columns
subset = df[['column1', 'column2']]

# Step 6: Filter rows based on a condition
filtered_df = df[df['column_name'] > value]

# Step 7: Handle missing data
df.fillna(0, inplace=True)

# Step 8: Save the processed data
df.to_csv('processed_data.csv', index=False)
Subscribe
Notify of

0 Comments
Inline Feedbacks
View all comments

Popular