CSV Module In Python
Introduction
CSV stands for "comma separated values". It is the most common and simple file structure used for storing and arranging tabular data. for example; a spreadsheet or database. It stores tabular data in the form of plain text. Here the commas are used to separate each column within the row and every part of the data is separated with a comma.
CSV is a simple format for data exchange as it is general and, simpler. A CSV file is represented when it is saved with .CSV file extension.
Working with Python CSV files:
Before using the functions to CSV module we have to import it using the below code:
import csv
The CSV library provides various functions to perform read and write operations in the CSV files.
Reading the CSV file:
To perform read operation in CSV file. first, we have to generate the reader object using the reader function.
The Python's built-in function open() returns a file object by which the csv file as a text file is opened then it is passed to the reader.
Here’s the employee_joining.txt file:
name, department,joining month Raam,Accounting, November Aakash,IT, March
Below is the code to read the above file:
import csv
with open('employee_joining.txt') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
line_count = 0
for row in csv_reader:
if line_count == 0:
print(f'Column names are {", ".join(row)}')
line_count += 1
else:
print(f'\t{row[0]} works in the {row[1]} department, and joined in {row[2]}.')
line_count += 1
print(f'Processed {line_count} lines.')
Output:
Column names are name, department, joining month
Raam works in the Accounting department, and joined in November.
Aakash works in the IT department, and joined in March.
Here, the reader returned each row is a list of string elements and it contains the data obtained by eliminating the delimiters.
Reading a CSV file as a dictionary:
We can perform the reading operation of CSV data directly with the dictionary. Here we can read the csv files using dictreader.
The results are represented as a dictionary in which the header row is placed as the key and other rows are placed as values.
Again, the input file is employee_joining.txt file:
name, department,joining month Raam,Accounting, November Aakash,IT, March
Code to read as a dictionary:
import csv
with open('employee_joining.txt', mode='r') as csv_file:
csv_reader = csv.DictReader(csv_file)
line_count = 0
for row in csv_reader:
if line_count == 0:
print(f'Column names are {", ".join(row)}')
line_count += 1
print(f'\t{row["name"]} works in the {row["department"]} department, and joined in {row["birthday month"]}.')
line_count += 1
print(f'Processed {line_count} lines.')
Output is the same as before:
Column names are name, department, birthday month
Raam works in the Accounting department, and joined in November.
Aakash works in the IT department, and joined in March.
Additional Python CSV reader parameters:
Different styles of CSV files are handled by the reader object via defining optional parameters.
Some parameters are as shown below:
- quotechar defines the character that is used to enclose fields that includes the delimiter character. quote (' " ') is the default.
- delimiter definesthe character that is used to divide each field. comma (',') is default.
- escapechar defines the character that is used to skip the delimiter character. no escape character is the default.
We can explore these more with the example below.
we have employee_addresses.txt file:
name,address,date joined Raam,Lajpat nagar New Delhi, 11024, Jan 4 Satyam, Rohini New Delhi, 110085, March 2
This file is consisting of three fields which are name, address, date joined which are separated by commas.
The problem is here is that the address field data is also containing the comma to specify the zip code.
Here, three different ways are possible to overcome this problem:
by using different delimiter:
We can safely use the comma in the data itself by using the different delimiter. we can use the delimiter additional parameter to define the new delimiter.
wrapping the data in quotes:
Here, the quotechar optional parameter can be used to define that the character used for quoting. whereas that character also does not show in the data, everything is fine.
by escaping the delimiter character:
Escape characters work as same as do in format strings and it invalidates the definition of the character being escaped and this is for delimiter character.
escapechar must specify the escape character when it is used.
Writing the CSV file:
The writing of csv file can be done by using .write_row() method and a writer object.
import csv
with open('employee_file.csv', mode='w') as employee_file:
employee_writer = csv.writer(employee_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
employee_writer.writerow(['Raam', 'Accounting', 'November'])
employee_writer.writerow(['Aakash', 'IT', 'March'])
Here, while writing which character is to be used is specified by the quotechar optional character.
It returns the plaintext which shows that the file is created for reading.
Output:
Raam, Accounting, November
Aakash,IT, March
Writing of CSV file from the dictionary:
As we can read the data from the dictionary. also, we can write the data from the dictionary as well.
import csv
with open('employee_file2.csv', mode='w') as csv_file:
fieldnames = ['emp_name', 'dept', 'birth_joining']
writer = csv.DictWriter(csv_file, fieldnames=fieldnames)
writer.writeheader()
writer.writerow({'emp_name': 'Raam', 'dept': 'Accounting', 'joining_month': 'November'})
writer.writerow({'emp_name': 'Aakash', 'dept': 'IT', 'joining_month': 'March'})
While writing a dictionary the fieldnames parameter is required.
Output:
emp_name,dept,joining_month
Raam,Accounting,November
Aakash,IT, March
Handling CSV files by using Pandas library:
Pandas is the most popular Data science library in Python. It is used for data analysis and data manipulation.
When the amount of data is large it is better to use the Pandas library to handle the CSV files.
we have to install the Pandas before using them and to
know about installation visit: How to install Pandas?
Reading CSV File by using Pandas:
The read_csv() function is employed for reading the CSV files using Pandas.
Example:
import pandas as pd
pd.read_csv("employee.csv")
This code reads the employee.csv from the current directory.
Writing CSV files by using Pandas:
The to_csv() function is employed for writing to the CSV files using Pandas. The to_csv() is DatafFrame's function.
Example:
import pandas as pd
# creating a data frame
df = pd.DataFrame([['Raam', 24], ['Aakash', 22]], columns = ['Name', 'Age'])
# writing data frame to a CSV file
df.to_csv('Age.csv')
Here the DataFrame is created by using pd.Dataframe and then to_csv() function is called for this object.
Output:
Name, Age
Raam, 24
Aakash, 22
Conclusion:
In this article, we have discussed about the Python CSV module and understood how to perform different operations for reading and write data in CSV Files.