Python Read CSV file
The CSV stands for “comma-separated value,” which is defined as a simple file format that is used to store data into a tabular form such as a database or spreadsheet.
It is a plain text file, which means, it can contain only actual text data (printable ASCII or Unicode) and has a standard format for data exchange.
A CSV file opens into the excel sheet, and the rows and columns data define the standard format.
Usage of CSV file
The CSV file is generally created for programs that handle a large amount of data. It is very appropriate to export data from spreadsheets and databases as well as import or uses it in other programs.
For example, you might export the results of a data-mining program to a CSV file and then import that into a database to analyze the data, generate graphs for a presentation, or prepare a report for the publication.
Python CSV Module Functions
- csv.field_size_limit - It returns the current maximum field size permitted by the parser.
- csv.get_dialect – It returns the dialect associated with a name.
- csv.list_dialects – It returns the names of all registered dialects.
- csv.reader – It reads the data from a CSV file
- csv.register_dialect - It associates dialect with a name. The name needs to be a string or a Unicode object.
- csv.writer - Write the data to a CSV file.
- csv.unregister_dialect - It deletes the dialect which is associated with the name from the dialect registry. If a name is not a registered as dialect name, then it will show an error.
- csv.QUOTE_ALL - It instructs the writer objects to quote all fields.
- csv.QUOTE_MINIMAL - It instructs the writer objects to quote only those fields which contain special characters such as quotechar, delimiter, etc.
- csv.QUOTE_NONNUMERIC - It instructs the writer objects to quote all the non-numeric fields.
- csv.QUOTE_NONE - It instructs the writer object to never quote the fields.
Read CSV file
The CSV library provides the functionality to both reads from and writes to the CSV file. First, we import csv module and open the csv file using Python’s built-in open() function. Python provides csv.reader() module which is used to read the csv file. It takes each row of the file and makes a list of all the columns.
We have taken a text file named as myfile.txt which contains data separated by comma (,). Let’s consider the following example:
csv file
name,department,birthday month
Parker,Accounting, November
Smith,IT,March
Example-1:
import csv #importing csv library with open('myfile.txt') as csv_file: #open text file named myfile csv_reader = csv.reader(csv_file, delimiter=',') line_count = 0 for row in csv_reader: if line_count == 0: print(f'Column names are {", ".join(row)}') line_count += 1 else: print(f'\t{row[0]} works in the {row[1]} department, and was born in {row[2]}.') line_count += 1 print(f'Processed {line_count} lines.')
Output:
The Column names are name, department, birthday month Parker works in the Accounting department, and was born in November. Smith works in the IT department, and was born in March. In the above program, the first-row returns the column name, which is handled in a specific way.
Read a CSV into a Dictionary
In the above code, reader object deals with a list of individual string elements. Instead of dealing with, we can read CSV data directly into a dictionary as well.
Again we take the myfile.txt as follows:
csv file
name,department,birthday month
Anubhav,CS,November
Himanshu,IT,March
Example-2
import csv # importing csv library with open('myfile.txt', mode='r') as csv_file: csv_reader = csv.DictReader(csv_file) line_count = 0 for row in csv_reader: if line_count == 0: print(f'The Column names are {", ".join(row)}') line_count += 1 print(f'\t{row["name"]} belongs to the {row["department"]} department, and was born in {row["birthday month"]}.') line_count += 1 print(f'Processed {line_count} lines.')
Output:
The Column names are name, department, birthday month
Anubhav belongs to the CS department, and was born in November.
Himanshu belongs to the IT department, and was born in March.
Reading CSV files with Pandas
Pandas is a Python library which is used for data analysis. It is highly optimized and built on the top of the Numpy library. To read the CSV file into pandas, we create the DataFrame to store data.
We don’t need to write many lines of code to analyze, and read theCSVfile. Here we are taking a file called studata.txt, which contains data of the student.
Name, Class, Admission date,
Saurabh 8th 12/07/2013
Suman 9th 10/07/2013
Ankita 8th 20/08/2013
Samrat 10th 01/08/2013
Anubhav 6th 01/08/2014
Apeksha 9th 02/08/2014
Example-3
import pandas
df = pandas.read_csv('studata.txt')
print(df)
Output:
Name class Admission Date
0 Saurabh 8th 12/07/2013
1 Suman 9th 10/07/2013
2 Ankita 8th 20/08/2013
3 Samrat 10th 01/08/2013
4 Anubhav 6th 01/08/2014
5 Apeksha 9th 23/07/2014
There are only three lines in the above code, and these are enough to read the file.