How to create a DataFrames in Python
How to create a DataFrames in Python
Data Frame is a data structure in which data is stored in tabular form. They can also be referred as two-dimensional collection of data.
Various arithmetic operations can be performed on a particular data frame such as addition and deletion of row/column.
Data Frames can be imported from external storages and in different forms, such as SQL databases, CSV file, and Excel files.
Below are various way to create a DataFrame’s in Python:
- Empty Data Frame
- Data Frame using Lists
- Data Frame using Dictionary
- Data Frames with Index
- Data Frames using Lists of Dictionaries
- Data Frames using Zip() function
- Data Frames using Dicts of Series
First requirement to create dataframe in python is Pandas library.
- Creating an Empty Data Frame
A basic empty data frame can be created by calling the constructor called ‘Data Frame’. Consider the following example.
Example
# importing pandas library as pd import pandas as pd # Calling DataFrame Constructor Data_frame = pd.DataFrame() print(Data_frame)
Output
Empty Data_Frame Columns: [] Index: []
- Creating DataFrame using Lists
DataFrame’s can also be created using list.
In the example below, we have defined a list called fruits_list which contains the names of the fruits.
Now, to make a dataframe of this list, we have called DataFrame constructor and passed fruits_list as an argument in it.
Example –
# importing pandas as pd import pandas as pd # string values in the list fruits_list = ['Apple', 'Banana', 'Strawberry' ,'Grapes' ,'Mango' ,'Watermelon' ,'Pine Apple'] # Calling DataFrame constructor on list dataframe = pd.DataFrame(fruits_list) print(dataframe)
Output
0 0 Apple 1 Banana 2 Strawberry 3 Grapes 4 Mango 5 Watermelon 6 Pine Apple
- Creating DataFrame using Dictionary
DataFrame’s can be created using dictionaries in the same way as lists.
First step is to create a dictionary, in the example below we have created a dictionary with the name ‘dictionary’ which contains the names of the city and their pincode.
Now, to make a dataframe of this dictionary, we have called DataFrame constructor and passed ‘dictionary’ as argument in it.
Example –
import pandas as pd # Dictionary of cities and their ranking dictionary={'CityName':['Indore','Bhopal','Visakhapatnam','Surat','Mysore','Tiruchirappalli','New Delhi Municipal Council'], 'Rank': [1,2,3,4,5,6,7]} # Creating a dataframe using DataFrame() dataframe = pd.DataFrame(data) # Print the output. print(dataframe)
OUTPUT
City Name Rank 0 Indore 1 1 Bhopal 2 2 Visakhapatnam 3 3 Surat 4 4 Mysore 5 5 Tiruchirappalli 6 6 New Delhi Municipal Council 7
- Creating DataFrame’s with Index
When any dataframe is created indexing starts from ‘0’ by default. And, if we don’t want number indexing then, we can change indexing according to our needs.
So, for changing the index’s names we pass the array of index names as an argument along with the dictionary inside the DataFrame() method.
Let’s understand with the following example:
Example –
import pandas as pd # Dictionary of cities and their ranking dictionary = {'City Name': ['Indore','Bhopal','Visakhapatnam','Surat','Mysore','Tiruchirappalli','New Delhi Municipal Council'], 'Rank': [1,2,3,4,5,6,7]} # Creating dataframe with indexing dataframe = pd.DataFrame(dictionary, index =['city1', 'city2', 'city3', 'city4','city5','city6','city7']) # print the data print(dataframe)
Output
City Name Rank city1 Indore 1 city2 Bhopal 2 city3 Visakhapatnam 3 city4 Surat 4 city5 Mysore 5 city6 Tiruchirappalli 6 city7 New Delhi Municipal Council 7
- Creating DataFrame using List of Dictionaries
DataFrame’s can also be created with list of dictionaries in the same way as above.
Let’s understand the following examples:
Example 1
import pandas as pd # created list of two dicionaries list_of_dictionaries = [{'A': 10, 'B': 20, 'C':30}, {'A':100, 'B': 200, 'C': 300}] # Creates DataFrame. dataframe = pd.DataFrame(list_of_dictionaries) # Print the data print(dataframe)
OUTPUT
A B C 0 10 20 30 1 100 200 300
Example 2
# In this example a DataFrame is created by passing a list of dictionaries as well as indexes names import pandas as pd # list of dictionaries is created list_of_dictionaries = [{'A': 2, 'B':3}, {'A': 10, 'B': 20, 'C': 30}] # list of dictionaries is passed as an argument with index names and converted into DataFrame dataframe = pd.DataFrame(list_of_dictionaries, index =['first', 'second']) # Print the dataframe print(dataframe)
Output
A B C first 2 3 NaN second 10 20 30.0
Example 3
# In this example DataFrame is created with the lists of dictionaries as well as with index names and coloumn names import pandas as pd # Created the lists of dictionaries List_of_dictionaries = [{'X': 1, 'Y': 2}, {'X': 5, 'Y': 10, 'Z': 20}] # converting the list of dictionaries into data frame dataframe1 = pd.DataFrame(List_of_dictionaries , index =['first', 'second'], columns =['X', 'Y']) # Converting the list of dictionaries into dataframe, but changing the one of the coloumn index name dataframe2 = pd.DataFrame(List_of_dictionaries, index =['first', 'second'], columns =['X', 'Y1']) # printing the first dataframe print (dataframe1, "\n") # printing the second dataframe print (dataframe2)
OUTPUT
X Y first 1 2 second 5 10 X Y1 first 1 NaN second 5 NaN
- Creating Dataframe using Zip() Function
DataFrame’s can also be created using zip() function. The zip() function is used to merge the two lists.
In this example first two lists are merged by zip() function and new tuple is formed. Then, new tuple is converted into dataframe.
import pandas as pd # List1 City_Name= ['Indore','Bhopal','Visakhapatnam','Surat','Mysore','Tiruchirappalli','New Delhi Municipal Council'] # List2 Rank= [1,2,3,4,5,6,7] # two lists are merged together using zip() function. # New Tuple is created after merging both the lists new_tuple = list(zip(City_Name, Rank)) # Printing new_tuple print(new_tuple) # converting the new_tuple into dataframe using DataFrame() dataframe = pd.DataFrame(new_tuple,columns=['City Name', 'Rank']) # Print data. print( dataframe)
OUTPUT
[('Indore', 1), ('Bhopal', 2), ('Visakhapatnam', 3), ('Surat', 4), ('Mysore', 5), ('Tiruchirappalli', 6), ('New Delhi Municipal Council', 7)]
City Name Rank
0 Indore 1
1 Bhopal 2
2 Visakhapatnam 3
3 Surat 4
4 Mysore 5
5 Tiruchirappalli 6
6 New Delhi Municipal Council 7
- Creating DataFrame using Dicts of Series.
We can use the Dicts of series where the subsequent index is the union of all series of passed index value. Let’s understand below example.
import pandas as pd # creating the dicts of series data = { 'Data Structure' : pd.Series([60,70,86,79], index =['Pragya','Rashi','Anupriya','Prince']), 'DBMS' : pd.Series([67,89,76,89], index =['Pragya','Rashi','Anupriya','Prince']) } # converting data into dataframe dataframe = pd.DataFrame(data) # print the dataframe print(dataframe)
OUTPUT
Data Structure DBMS
Pragya 60 67
Rashi 70 89
Anupriya 86 76
Prince 79 89