Python Pandas Tutorial for Beginners

Introduction to Python Pandas

“According to Wikipedia, Pandas’ name is derived from the econometrics term Panel Data for multidimensional data sets that include observations over multiple time periods for the same individuals”. 

Pandas stands for Python Data Analysis Library. Pandas is an open-source, BSD-Licensed library of Python Programming Language written by Wes McKinney in 2008 for developers to provide suitable and highly-optimized performance tools for data analysis, cleaning, and manipulation with the powerful, expressive, and flexible data structures like Data Frames and Series.

Pandas built on top of NumPy as it depends on and inter operates with NumPy for faster numeric array computations. NumPy is a python library for matrices computation, single and multi-dimensional arrays computation, along with an extensive collection of high-level numerical tools for array operations.

Pandas enables developers for carrying out their entire data analysis workflow in Python without having to switch to a more domain specific language like R.Pandas is well suited for various kinds of data such as:

  • Ordered and unordered time series data
  • Arbitrary matrix data
  • Tabular data with heterogeneously-typed columns
  • Unlabeled data
  • Observational or statistical data sets

Before Pandas, Python majorly used for data munging and preparation. It had a minor contribution to data analysis. Pandas library solved this problem with the accomplishment of five typical steps in data analysis and processing, regardless of data origin- load, prepare, manipulate, model, and analyze.

Python with Pandas use in a wide range of academic and commercial domains sectors that includes finance, economics, Statistics, analytics, etc.

Pandas Installation

For Pandas installation go to the command line/terminal and type

pip install pandas

or

You can install Pandas using Anaconda Python Package (https://www.anaconda.com/) as the best way and then type

conda install pandas

After having a complete installation, go to IDE(Jupyter, PyCharm, etc.) and import Pandas just by typing

import pandas as pd

Features of Pandas

  1. Pandas provides elegant and simple API.
  2. Pandas performs highly for merging and joining of high-volume datasets.
  3. Pandas is easy to learn, use, and maintain by which you can focus more on research with less programming.
  4. Pandas bridges the gap between rapid iterations of ad-hoc analysis and production quality code.
  5. Pandas consists of Data Frames object for fast and efficient data manipulation with integrated indexing.
  6. Pandas offers Time-Series functionality that includes
  • data range generation and frequency version
  • moving window statistics and window linear regressions
  • date shifting and lagging

Data Analysis can be done in Pandas by dealing with the fast data structures (Series and Data Frames)built on top of NumPy array.

Series is a one-dimensional homogeneous array of an immutable size that enables us to store any data type (integer, string, float, python objects, etc.).

10 32 11 54 27 83 91 33 67 77 22
  • Pandas Series can be created by using the given constructor-

where

data contains various forms likend array, list, constants, scalar value (can be integer Value, string), and  Python Dictionary (can be Key, Value pair).

index represents the collection of axis labels whose values must be unique, hashable, and of the same length as data.

dtype is for data type.

copy is for copying data. Default False.

NOTE: If index not given explicitly, then Pandas construct Range Index with range (0 to N-1) where N is the total number of elements that Series consists.

  1. Program to create empty Series.
  • Empty Series defines a Basic Series.

  1. Program to create ndarray (N-Dimensional) Series
  • If no index passed then by default we have index of range(n) where n is length of array.

OUTPUT:

  • For data in ndarray, index passed must be same as the length of the array.

OUTPUT:

  1. Program to create Dictionary Series

Dictionaries are Python Data Structure that allows storing data in key and value form such that key is a word used to access value (a piece of data).

  • If no index is passed then the dictionary keys are taken in sorted order to construct index.

OUTPUT:

  • If index is passed then the values in data correlating with labels in index will be thrown out.

OUTPUT:

  1. Program to create Scalar Series.
  • If data contains scalar value, index must be provided.

OUTPUT:

Pandas Operations with Series

Create a Series.

OUTPUT:

Example 1:  

To retrieve the third element from Series with position.

OUTPUT:

Example 2: To retrieve first three elements from Series with position.

OUTPUT:

Example 3:

To retrieve last two elements from Series with position.

OUTPUT:

Example 4: 

To retrieve multiple elements from Series, use a list of index label values.

OUTPUT:

DataFrame

DataFrame represents a mutable sized tabular data structure with rows and columns,seems like a dictionary of Series instances where each column is a Series object and rows consist of elements inside Series.

Ac.No. Name Amount
15330000110 Niti Ahuja 5,45,000
22451200001 Om Kashyap 33000
54322344002 Nikhil Marathee 1,27,800
32894352323 Ranya Khan 3,95,300
17344000658 Sanjana Gupta 77000
  • Pandas DataFrame can be created by using the given constructor:

where the parameters:

data contains various forms like map, lists, constants, 2D-numpy Ndarray, one or more Series, One or more Dictionaries, and also other DataFrame.

index are for row labels.

columns are for column labels.

dtype is the Data type of each column.

copy is for copying data, if Default is False.

  1. Program to create empty DataFrame
  • Empty DatFrame defines a DataFrame.

OUTPUT:

  1. Program to create DataFrame from List

Example 1:

OUTPUT:

Example 2:

OUTPUT:

  1. Program to create DataFrame from Dictionaries

OUTPUT:

  1. Program to create DataFrame fromSeries

OUTPUT:

Pandas Operations with DataFrame

Create DataFrame

OUTPUT:  

  1. Slicing of rows
  • Input the given command for first two rows

OUTPUT:     

  • Input the given command for last two rows

OUTPUT:    

  1. Column Selection

OUTPUT:

  1. Column Addition

OUTPUT:

  1. Column Deletion

A column can be deleted either by DEL function

OUTPUT:   

Or

by POP function

OUTPUT:    

Reference:
https://www.guru99.com/python-pandas-tutorial.html