Working with Excel Sheets and CSV Files Using Pandas for Data Processing (89/100 Days of Python)

Martin Mirakyan
4 min readMar 31, 2023

--

Day 89 of the “100 Days of Python” blog post series covering working with Excel and CSV files in pandas

Excel and CSV files are two of the most commonly used file formats for storing and analyzing data. Pandas is a popular data manipulation library in Python that provides a simple and efficient way to work with these file formats. In this tutorial, we will explore how to read and write Excel and CSV files in Pandas with real-world examples.

Reading Excel Files in Pandas

Pandas provides a function called read_excel() to read Excel files. The function takes the file path as the argument and returns a DataFrame object:

import pandas as pd

# Reading the Excel file
df = pd.read_excel('sales_data.xlsx')

# Displaying the first five rows of the DataFrame
print(df.head())

In this example, we are reading the sales_data.xlsx file and storing the data in a DataFrame object named df. We are then printing the first five rows of the DataFrame using the head() function.

Pandas also allows you to read specific sheets from an Excel file by specifying the sheet name or sheet index:

# Reading a specific sheet from the Excel file
df = pd.read_excel('sales_data.xlsx', sheet_name='Sheet1')

# Displaying the first five rows of the DataFrame
print(df.head())

In this example, we are reading the Sheet1 sheet from the sales_data.xlsx file and storing the data in a DataFrame object named df.

Writing Excel Files in Pandas

Pandas provides a function called to_excel() to write data to Excel files. The function takes the file path and sheet name as arguments:

# Writing the DataFrame to an Excel file
df.to_excel('output.xlsx', sheet_name='Sheet1', index=False)

# Displaying a message after the file has been written
print('Excel file written successfully!')

In this example, we are writing the df DataFrame to an Excel file named output.xlsx and specifying the sheet name as Sheet1. We are also setting the index argument to False to exclude the index column from the output.

Reading CSV Files in Pandas

Pandas provides a function called read_csv() to read CSV files. The function takes the file path as the argument and returns a DataFrame object:

# Reading the CSV file
df = pd.read_csv('sales_data.csv')

# Displaying the first five rows of the DataFrame
print(df.head())

In this example, we are reading the sales_data.csv file and storing the data in a DataFrame object named df. We are then printing the first five rows of the DataFrame using the head() function.

Pandas also allows you to read CSV files with different delimiters, such as tabs or semicolons. You can specify the delimiter using the delimiter or sep argument:

# Reading a CSV file with a semicolon delimiter
df = pd.read_csv('sales_data.csv', delimiter=';')

# Displaying the first five rows of the DataFrame
print(df.head())

Here we are reading the sales_data.csv file with a semicolon delimiter and storing the data in a DataFrame object named df.

Writing CSV Files in Pandas

Pandas provides a function called to_csv() to write data to CSV files. The function takes the file path as the argument:

# Writing the DataFrame to a CSV file
df.to_csv('output.csv', index=False)

# Displaying a message after the file has been written
print('CSV file written successfully!')

In this example, we are writing the df DataFrame to a CSV file named output.csv and setting the index argument to False to exclude the index column from the output.

Real-World Example: Analyzing Sales Data

Let’s work through a real-world example to demonstrate how to work with Excel and CSV files in Pandas. Suppose you have a sales data file in Excel format that contains information about sales made by a company. The file has the following columns:

  • Date: the date the sale was made
  • Product: the name of the product sold
  • Quantity: the quantity of the product sold
  • Price: the price of each unit sold
  • Total: the total amount of the sale

You want to analyze this data to answer the following questions:

  1. What was the total revenue generated by the company?
  2. What was the most popular product sold?
  3. What was the average price per unit sold?

First, let’s read the Excel file into a Pandas DataFrame:

sales_df = pd.read_excel('sales_data.xlsx')

To answer the first question, we need to calculate the total revenue generated by the company. We can do this by multiplying the quantity and price columns and then summing up the values in the Total column:

# Calculating the total revenue
total_revenue = sales_df['Total'].sum()
print(f'Total revenue generated by the company: ${total_revenue:.2f}')

To answer the second question, we need to find the most popular product sold. We can do this by grouping the DataFrame by the Product column and then summing up the values in the Quantity column:

# Finding the most popular product sold
popular_product = sales_df.groupby('Product')['Quantity'].sum().idxmax()
print(f'Most popular product sold: {popular_product}')

To answer the third question, we need to calculate the average price per unit sold. We can do this by dividing the sum of the values in the Total column by the sum of the values in the Quantity column:

# Calculating the average price per unit sold
average_price = sales_df['Total'].sum() / sales_df['Quantity'].sum()
print(f'Average price per unit sold: ${average_price:.2f}')

Finally, let’s save the results to a new Excel file:

# Creating a new DataFrame with the results
results_df = pd.DataFrame({
'Metric': ['Total Revenue', 'Most Popular Product', 'Average Price per Unit'],
'Value': [total_revenue, popular_product, average_price]
})

# Writing the results to a new Excel file
results_df.to_excel('sales_data_results.xlsx', index=False)

So, we created a new DataFrame with the results and then wrote the data to a new Excel file named sales_data_results.xlsx.

What’s next?

--

--

Martin Mirakyan
Martin Mirakyan

Written by Martin Mirakyan

Software Engineer | Machine Learning | Founder of Profound Academy (https://profound.academy)

No responses yet