Python Tutorial

Introduction Python Features Python Applications Python System requirements Python Installation Python Examples Python Basics Python Indentation Python Variables Python Data Types Python IDE Python Keywords Python Operators Python Comments Python Pass Statement

Python Conditional Statements

Python if Statement Python elif Statement Python If-else statement Python Switch Case

Python Loops

Python for loop Python while loop Python Break Statement Python Continue Statement Python Goto Statement

Python Arrays

Python Array Python Matrix

Python Strings

Python Strings Python Regex

Python Built-in Data Structure

Python Lists Python Tuples Python Lists vs Tuples Python Dictionary Python Sets

Python Functions

Python Function Python min() function Python max() function Python User-define Functions Python Built-in Functions Python Recursion Anonymous/Lambda Function in Python apply() function in python Python lambda() Function

Python File Handling

Python File Handling Python Read CSV Python Write CSV Python Read Excel Python Write Excel Python Read Text File Python Write Text File Read JSON File in Python

Python Exception Handling

Python Exception Handling Python Errors and exceptions Python Assert

Python OOPs Concept

OOPs Concepts in Python Classes & Objects in Python Inheritance in Python Polymorphism in Python Python Encapsulation Python Constructor Python Super function Python Static Method Static Variables in Python Abstraction in Python

Python Iterators

Iterators in Python Yield Statement In Python Python Yield vs Return

Python Generators

Python Generator

Python Decorators

Python Decorator

Python Functions and Methods

Python Built-in Functions Python String Methods Python List Methods Python Dictionary Methods Python Tuple Methods Python Set Methods

Python Modules

Python Modules Python Datetime Module Python Math Module Python Import Module Python Time ModulePython Random Module Python Calendar Module CSV Module in Python Python Subprocess Module

Python MySQL

Python MySQL Python MySQL Client Update Operation Delete Operation Database Connection Creating new Database using Python MySQL Creating Tables Performing Transactions

Python MongoDB

Python MongoDB

Python SQLite

Python SQLite

Python Data Structure Implementation

Python Stack Python Queue Python Linked List Python Hash Table Python Graph

Python Advance Topics

Speech Recognition in Python Face Recognition in Python Python Linear regression Python Rest API Python Command Line Arguments Python JSON Python Subprocess Python Virtual Environment Type Casting in Python Python Collections Python Attributes Python Commands Python Data Visualization Python Debugger Python DefaultDict Python Enumerate

Python 2

What is Python 2

Python 3

Anaconda in Python 3 Anaconda python 3 installation for windows 10 List Comprehension in Python3

How to

How to Parse JSON in Python How to Pass a list as an Argument in Python How to Install Numpy in PyCharm How to set up a proxy using selenium in python How to create a login page in python How to make API calls in Python How to run Python code from the command prompt How to read data from com port in python How to Read html page in python How to Substring a String in Python How to Iterate through a Dictionary in Python How to convert integer to float in Python How to reverse a string in Python How to take input in Python How to install Python in Windows How to install Python in Ubuntu How to install PIP in Python How to call a function in Python How to download Python How to comment multiple lines in Python How to create a file in Python How to create a list in Python How to declare array in Python How to clear screen in Python How to convert string to list in Python How to take multiple inputs in Python How to write a program in Python How to compare two strings in Python How to create a dictionary in Python How to create an array in Python How to update Python How to compare two lists in Python How to concatenate two strings in Python How to print pattern in Python How to check data type in python How to slice a list in python How to implement classifiers in Python How To Print Colored Text in Python How to open a file in python How to Open a file in python with Path How to run a Python file in CMD How to change the names of Columns in Python How to Concat two Dataframes in Python How to Iterate a List in Python How to learn python Online How to Make an App with Python How to develop a game in python How to print in same line in python How to create a class in python How to find square root in python How to import numy in python How to import pandas in python How to uninstall python How to upgrade PIP in python How to append a string in python How to comment out a block of code in Python How to change a value of a tuple in Python

Sorting

Python Sort List Sort Dictionary in Python Python sort() function Python Bubble Sort

Programs

Factorial Program in Python Prime Number Program in Python Fibonacci Series Program in Python Leap Year Program in Python Palindrome Program in Python Check Palindrome In Python Calculator Program in Python Armstrong Number Program in Python Python Program to add two numbers Anagram Program in Python Number Pattern Programs in Python Even Odd Program in Python GCD Program in Python Python Exit Program Python Program to check Leap Year Operator Overloading in Python Pointers in Python Python Not Equal Operator Raise Exception in Python Salary of Python Developers in India What is a Script in Python

Misc

Introduction to Scratch programming SKLearn Clustering SKLearn Linear Module Standard Scaler in SKLearn Python Time Library SKLearn Model Selection Standard Scaler in SKLearn Accuracy_score Function in Sklearn Append key Value to Dictionary in Python Cross Entropy in Python Cursor in Python Data Class in Python How to Install Tweepy in Python Imread Python Program of Cumulative Sum in Python Python Program for Linear Search Python Program to Generate a Random String Read numpy array in Python Scrimba python Sklearn linear Model in Python Scraping data in python Accessing Key-value in Dictionary in Python Find Median of List in Python Linear Regression using Sklearn with Example Problem-solving with algorithm and data structures using Python Python 2.7 data structures Python Variable Scope with Local & Non-local Examples Arguments and parameters in Python Assertion error in python Programs for Printing Pyramid Patterns in Python _name_ in Python Amazon rekognition using python Anaconda python 3.7 download for windows 10 64-bit Android apps for coding in python Augmented reality in python Best app for python Difference between Perl and Python Not supported between instances of str and int in python Python comment symbol Python Complex Class Python IDE names Selection Sort Using Python Hypothesis Testing in Python Idle python download for Windows Insertion Sort using Python Merge Sort using Python Python - Binomial Distribution Python Logistic Regression with Sklearn & Scikit Python Random shuffle() method Python variance() function Python vs HTML Removing the First Character from the String in Python Adding item to a python dictionary Best books for NLP with Python Best Database for Python Count Number of Keys in Dictionary Python Cross Validation in Sklearn Drop() Function in Python EDA in Python Excel Automation with Python Python Program to Find the gcd of Two Numbers Python Web Development projects Adding a key-value pair to dictionary in Python Python Euclidean Distance Python Filter List Python Fit Transform Python e-book free download Python email utils Python range() Function Python random.seed() function What is the re.sub() function in Python Python PPTX Python Pickle Python Seaborn Python Coroutine Python EOL Python Infinity Python math.cos and math.acos function Python Project Ideas Based On Django Reverse a String in Python Reverse a Number in Python Python Word Tokenizer Python Trigonometric Functions Python try catch exception GUI Calculator in Python Implementing geometric shapes into the game in python Installing Packages in Python Python Try Except Python Sending Email Socket Programming in Python Python CGI Programming Python Data Structures Python abstract class Python Compiler Python K-Means Clustering NSE Tools In Python Operator Module In Python Palindrome In Python Permutations in Python Pillow Python introduction and setup Python Functionalities of Pillow Module Python Argmin Python whois Python JSON Schema Python lock Return Statement In Python Reverse a sentence In Python tell() function in Python Why learn Python? Write Dictionary to CSV in Python Write a String in Python Binary Search Visualization using Pygame in Python Latest Project Ideas using Python 2022 Closest Pair of Points in Python ComboBox in Python Python vs R Best resources to learn Numpy and Pandas in python Check Letter in a String Python Python Console Python Control Statements Convert Float to Int in Python using Pandas Difference between python list and tuple Importing Numpy in Pycharm Python Key Error Python NewLine Python tokens and character set Python Strong Number any() Keyword in python Best Database in Python Check whether dir is empty or not in python Comments in the Python Programming Language Convert int to Float in Python using Pandas Decision Tree Classification in Python End Parameter in python __GETITEM__ and __SETITEM__ in Python Python Namespace Python GUI Programming List Assignment Index out of Range in Python List Iteration in Python List Index out of Range Python for Loop List Subtract in Python Python Empty Tuple Python Escape Characters Sentence to python vector Slicing of a String in Python Executing Shell Commands in Python Genetic Algorithm in python Get index of element in array in python Looping through Data Frame in Python Syntax of Map function in Python After Python What Should I Learn Python AIOHTTP Alexa Python Artificial intelligence mini projects ideas in python Artificial intelligence mini projects with source code in Python Find whether the given stringnumber is palindrome or not First Unique Character in a String Python Python Network Programming Python Interface Python Multithreading Python Interpreter Data Distribution in python Flutter with tensor flow in python Front end in python Iterate a Dictionary in Python Iterate a Dictionary in Python – Part 2 Allocate a minimum number of pages in python Assertion Errors and Attribute Errors in Python Checking whether a String Contains a Set of Characters in python Python Control Flow Statements *Args and **Kwargs in Python Bar Plot in Python Conditional Expressions in Python Function annotations() in Python How to Write a Configuration file in Python Image to Text in python import() Function in Python Import py file in Python Multiple Linear Regression using Python Nested Tuple in Python Python String Negative Indexing Reading a File Line by Line in Python Python Comment Block Base Case in Recursive function python ER diagram of the Bank Management System in python Image to NumPy Arrays in Python NOT IN operator in Python One Liner If-Else Statements in Python Sklearn in Python Cube Root in Python Python Variables, Constants and Literals What Does the Percent Sign (%) Mean in Python Creating Web Application in python Notepad++ For Python PyPi TensorFlow Python | Read csv using pandas.read_csv() What is online python free IDE What is Python online compiler Run exec python from PHP What are the Purposes of Python What is Python compiler GDB Python coding platform Python Classification Python | a += b is not always a = a + b PyDev with Python IDE Character Set in Python Best Python AI Projects _dict_ in Python Python Ternary Operators Self in Python Python vs Java Python Modulo Python Packages Python Syntax Python Uses Python Bitwise Operators Python Identifiers Python Matrix Multiplication Python AND Operator Python Logical Operators Python Multiprocessing Python Unit Testing __init__ in Python Advantages of Python Is Python Case-sensitive when Dealing with Identifiers Python Boolean Python Call Function Python History Python Image Processing Python main() function Python Permutations and Combinations Difference between Input() and raw_input() functions in Python Conditional Statements in python Confusion Matrix Visualization Python Nested List in Python Python Algorithms Python Modules List Difference between Python 2 and Python 3 Is Python Case Sensitive Method Overloading in Python Python Arithmetic Operators Assignment Operators in Python Is Python Object Oriented Programming language Python Division Python exit commands Continue And Pass Statements In Python Colors In Python Convert String Into Int In Python Convert String To Binary In Python Convert Uppercase To Lowercase In Python Convert XML To JSON In Python Converting Set To List In Python Covariance In Python CSV Module In Python Decision Tree In Python Difference Between Yield And Return In Python Dynamic Typing In Python BOTTLE Python Web Framework How to Install Scikit-Learn Introducing modern python computing in simple packages Python vs PHP Reason for Python So Popular Returning Multiple Values in Python Spotify API in Python Spyder (32-bit) - Free download Time. Sleep() in Python Traverse Dictionary in Python What is Ipython shell YOLO Python Nested for Loop in Python Data Structures and Algorithms Using Python | Part 1 Data Structures and Algorithms using Python | Part 2 ModuleNotFoundError No module named 'mysql' in Python N2 in Python XGBoost for Regression in Python

Decision Tree in Python

Decision Tree is one of the most essential algorithms in the area of machine learning for classification and regression.

But let us first talk about the lifespan of every machine learning model before starting with the method. This graphic illustrates the development of a scratch model for machine learning, then follows up the same model with hyperparameter tuning, determines the implementation methods for that model and establishes logging and monitoring frameworks once implemented.

Decision tree method is one of the most flexible machine learning algorithms that can analyse both regression and classification. It works extremely strong and with complicated datasets. It's pretty straightforward to comprehend, besides that. This method works by splitting the entire data set into a tree-like structure depending on specific criteria and circumstances.

Take a simple example, say it is Friday evening and you can't decide whether to go home or remain. Let it decide for you through the decision tree.

Decision Tree In Python
  • The node will be selected depending on a certain circumstance, e.g. when our root node is >10 pm.
  • The root node was then divided into children's notes according to the provided criteria. In the previous illustration the right child node met the requirement and there were no more questions.
  • The left node of the kid did not fulfil the criterion and was thus divided into another condition.
  • The procedure will continue until all requirements have been satisfied, or if you have already determined the depth of your tree, e.g. the depth of our tree, 3.

Decision Tree for Regression

When regression is performed using a decision tree, we are trying to split the X values into separated and uncomplicated areas for example for a set of potential values X1, X2, ..., Xp; we are going to attempt to divide them into J distincted and uncomplicated areas R1, R2, . – RJ. For a particular observation that falls within the RJ area, the forecast is the mean of the y-response values for each observation(s) of the training in the Rj region. The R1,R2, . ., RJ areas are chosen to decrease the following sum of residual squares:

Decision Tree In Python

Where yrj is the mean of all variables of response in area 'j' (second term).

As indicated above, we attempt to divide X values into j areas, however in computing time it is highly costly to attempt to fit each set of X values into j regions. Recursive binary split (Greedy method). Thus, a decision tree opts for a gullible method at the top down in which nodes are divided into two regions according to the provided criteria. This means that not every node is divided but those that fulfil the requirement are divided into two branches. It is considered greedy because at that moment it splits best instead of trying to divide one step towards a better tree in the next stages. It determines to divided the observations into various regions(j) by a threshold value(s) such that the RSS is minimal for Xj>=s and Xj <s.

Decision Tree In Python

For this equation, j and s are determined to have the smallest value in this equation. Based on this s and j value, the areas R1, R2 are selected, so that the above equation has the lowest value. More regions are also divided amongst the above-mentioned zones, depending on the same logical criteria. This runs until a (pre-defined) stop criteria is met. The forecast is made on the basis of the mean of data in this region once all regions are divided.

The aforementioned procedure is highly likely, given that it is quite complicated, to overfit the training data.

Shape of the Tree

Tree cuttings are a way to minimise the complexity and variation of data by cutting the tree down to the entire tree (obtained in the previous procedure). We may regularise the decision tree model by adding a new term just as we regulated linear regression.

Where T is the subset of the entire T0 tree And α is the non-negative parameter which impairs the MSE by increasing the length of the tree. By utilising cross-validation, these α and T values are determined, the lowest test error rate is provided by our model. This is how the model of the decision tree is working. Let's now look at the functioning algorithm of a decision tree classification. Algorithm of greedy The greedy algorithm "searches for an optimum breach at top level according to Hands-on Machine Learning Book, then repeats the procedure at each level. It does not verify that the split leads to many levels of impurity at the lowest feasible level.

  • Criterion: (default="gini") string, discretionary.
  • The quality estimation capacity of a division. "gini" for Gini debasements and "entropy" are upheld basis for acquiring data.
  • Max profundity: int or none (default=none) discretionary Tree's most elevated profundity. Assuming none, the hubs are stretched out to simply every one of the leaves, or to not as much as min tests split examples in every one of the leaves.
  • Min tests split (default=1): int, drift, discretionary. The insignificant number of tests essential for isolating an inner hub:
  • If int, consider the insignificant number for min tests split.
  • If drifting, the base number of tests for each split is a small portion, and roof is (min tests split * n tests). Changed adaptation:: 0.18 Float portion esteems added.
  • msamples leaf (default=1): int, coast, discretionary. The base example number essential for a leaf hub. A split point is possibly viewed as in a profundity in the event that it leaves tests of preparing min tests leaf in each part of the left and right. This can smooth the example, specifically during relapse.
  • If int, the base number ought to be min tests leaf.
  • If drift, min tests leaf will comprise of a level of the hub, and the base examples for every hub will approach the roof (min tests leaf*n tests).
  • Max highlights (default=None): int, buoy, string or None When searching for the best separation, the measure of elements to analyze is:
If int, then consider `max_features` features at each split.

If float, then `max_features` is a fraction and

`int(max_features * n_features)` features are considered at each split.

   - If "auto", then `max_features=sqrt(n_features)`.

   - If "sqrt", then `max_features=sqrt(n_features)`.

   - If "log2", then `max_features=log2(n_features)`.

   - If None, then `max_features=n_features`.

Note: The split pursuit doesn't end until something like one legitimate hub test parcel is distinguished, regardless of whether more than max include highlights should be viably investigated.

  • Random state: int, RandomState or None (default=None) Instance, discretionary If the arbitrary state is the seed the generator utilizes, then, at that point, Random state is the irregular number generator if the RandomState occurrence is; If None, the RandomState object utilized by np.random is the arbitrary number generator.
  • (Default=1e-7) min pollutions split: coast early stop limit for the development of the tree. In the event that the hub is over the limit, it will part, else it will be a leaf.
  • Class weight: dict, rundown of dicts, "balance" or "none" Weights in the structure {class name: weight} associated with the classes. All classes ought to be weight one, if not expressed. A rundown of dicts can be displayed in a similar request for multi-yield issues as the y segments: bool, discretionary (default=False)
  • Whether to recommend the information to speed up the quest for ideal parts. Setting this to genuine can dial back the preparation cycle for default sets of a choice tree on huge datasets. This can speed up preparing while using either a more modest dataset or a restricted profundity.
  • When tuning the hyperparameters, we plan to recognize these hyperparameter sets and qualities that offer us a model with the best conceivable accuracy. Allow us to keep on upgrading our model.
In[]:

scalar = StandardScaler()

x_transform = scalar.fit_transform(X)

In[]:

x_train,x_test,y_train,y_test = train_test_split(x_transform,y,test_size = 0.30, random_state= 355)

Although our dataset is tiny, let us apply PCA to choose a feature and see whether our accuracy is improved.

In []:

from sklearn.decomposition import PCA

import numpy as np

pca = PCA()

principalComponents = pca.fit_transform(x_transform)

plt.figure()

plt.plot(np.cumsum(pca.explained_variance_ratio_))

plt.xlabel('Number of Components')

plt.ylabel('Variance (%)') #for each component

plt.title('Explained Variance')

plt.show()

We can observe that 8 components explain around 95 percent of the variation. So let's utilise these 8 main components instead of all 11 columns as input into our algorithm.

In [170]:

pca = PCA(n_components=8)

new_data = pca.fit_transform(x_transform)

principal_x = pd.DataFrame(new_data,columns=['PC-1','PC-2','PC-3','PC-4','PC-5','PC-6','PC-7','PC-8'])

In []:

principal_x

Post-Pruning

Post-pruning is the process of first generating the decision tree and then deleting the non-important branches. Cross-validation data is utilized to assess whether or not extending a node improves the effect of pruning and testing. If there is an improvement, this node is further extended if the precision decreases, then the node must not be expanded and converted to a leaf node.

Pre-pruning

Pre-pruning, known as forward pruning, prevents the generation of non-important branches. It utilises a criterion to determine when splitting certain branches should end prematurely when the tree is produced.

Classification Trees

For quantitative data, regression trees are employed. We utilise classification trees for qualitative data or categorical data. We divide the nodes in regression trees based on RSS criteria, while the classification is based on the error rate of classification, the impurity of Gini and entropy. Let's be detailed in these words.

Entropy

Entropy is the randomness measurement for the data. In other words, the impurity in the dataset is given. When we split our nodes into two areas and place distinct observations in both regions, the primary objective is to reduce entropy, i.e. to minimize randomness in the region. When dividing the node doesn't contribute to a reduction in entropy, we try to divide or halt according to another criterion. A area is clean when it includes data on identical labels (low entropy) and random if a label mix is present (high entropy). Suppose 'm' observations are present and we must classify them according to categories 1 and 2.

Decision Tree In Python

Let's assume, there are 'n' observations in category 1, and 'm-n' observations in category 2.

p= n/m and q = m-n/m = 1-p

Then, entropy for the given set is:

E = -p*log2(p) – q*log2(q)

In category 1 all observations are p = 1 and all observations are category 2, then p = 0, both of them E =0, as the categories have no randomness. When half of the data are in categories 1 and half in category 2, then a maximum entropy is p = 1/2 and q = 1/2. E = 1.

Decision Tree In Python

Information Gain

The gain of information calculates the entropy drop when the node is divided. It is the difference between entropies before and following the division. The more information is gained, the greater the entropy.

Decision Tree In Python

Where T is the pre-divided node and X is the T divided node.

A tree divided by entropy and the value of information looks like:

Decision Tree In Python

Ginni Impurity

According to wikipedia, "gini impurity is a measure of how often a randomly selected item in the set is wrongly labelled by random label distribution in a subset." The probability is computed by multiplying the classification of one observation by sum of all probabilities and classifying it in the wrong class. Let's assume that there are k classes and that an observation comes in class I.

The impurity value for ginni falls between 0 and 1.0 is no uncleanness and 1 is random. The root node to divide is the node for which Ginni impurity is less common.

A tree divided according to the impurity value of ginni seems like:

Decision Tree In Python

Different Algorithms for Decision Tree

  • ID3: it's one of the algorithms that are used to create the classification decision tree. It leverages the acquisition of information to locate and divide root nodes. Only categorical characteristics are accepted.
  • C4.5: is a continuous, as well as a distinct, value extension of ID3 and superior than ID3 algorithms. It is often used for grading reasons.
  • Algorithm for classification and regression (CART): This method is the most frequent utilized for building decisive trees. It utilizes Gini impurity to calculate root nodes by default, but "entropy" may be used for criterion. This technique works on issues of both regression and classification. In our python implementation, we will utilize this algorithm.

The impurity of Entropy and Ginni can be reversibly utilised. It has no great impact on the result. Ginni can be computed more easily than entropy, because entropy includes a computation of the log term. Therefore, Ginni is the default algorithm for the CART algorithm. We can observe that there is not much difference between them when we are plotting ginni vs entropy:

Decision Tree In Python

Decision Tree advantages:

  • It may be used both for issues of regression and classification.
  • Decision Trees are relatively straightforward to understand because the splitting rules are explicitly indicated.
  • When you envision complex decision tree models, they are extremely easy. It is only visualized that can be comprehended.
  • There is no requirement for scaling and normalization.

Disadvantages of Decision Tree:

  • During the greedy method, a little change of data may lead to model instability.
  • For decision trees, the likelihood of overfitting is quite high.
  • It takes longer than other classification methods to train the model for the decision tree.

Cross-Validation

Suppose you are using a certain algorithm to train a model on a given data set. You attempted to find the precision of the trained model using the same training data and discovered that it was 95% or maybe 100% accurate. What does that mean? What does that mean? Is your model predictable? The reply is no. Why? Because the model has trained on the particular data, i.e. the data is well-known and generalized. But if you try to forecast a fresh piece of information, it most likely gives you very low accuracy, as it has never seen the data before. That's the overfitting problem. Cross-validation enters the picture in order to address this problem. Cross-validation is a re-evaluation approach with the essential notion that the training data set should be divided into two portions, namely training and testing. You try to train the model on one portion (train), and on the second part (test), i.e. the data that is not visible for the model, to forecast and verify your model. If your model works on your test data in a decent manner, it implies that you have not over fitted the training data and can trust the prediction, but our model is not to be believed if it is performed with bad precision, we must adjust our algorithm.

See the many Cross-Validation approaches:

Method Hold Out:

The techniques of the CV are the most fundamental. The dataset is simply divided into two training and test sets. The training dataset is utilized for training the model and the test data for predictions are then included in the learned model. This is the basis for checking and evaluating our model. The approach is utilised since it is less expensive computer-based. However, the assessment based on the Hold-out set might have a significant variability, because the data points in the training set and the test data rely substantially on. Whenever this division changes, the assessment will be different.

  • k-fold Cross-Validation
Decision Tree In Python

Implementation in Python

For implementing decision tree algorithms, we will utilise the Sklearn module. Sklearn employs the CART method and uses Gini impurity as a standing criterion in the division of nodes by default.

Importing required Modules

import pandas as pd

import graphviz

from sklearn.tree import DecisionTreeClassifier, export_graphviz

from sklearn import tree

from sklearn.model_selection import train_test_split,GridSearchCV

from sklearn.preprocessing import StandardScaler

from sklearn.metrics import accuracy_score, confusion_matrix, roc_curve, roc_auc_score

from sklearn.externals.six import StringIO

from IPython.display import Image

from sklearn.tree import export_graphviz

import pydotplus

data = pd.read_csv(“winequality_red.csv”)

data
Decision Tree In Python
X = data.drop(columns = 'quality')

y = data['quality']

where X is Independent values and y is dependent values

Splitting dataset into Test data and Training data

x_train,x_test,y_train,y_test = train_test_split(X,y,test_size = 0.30, random_state= 355)

In []:

#let's first visualize the tree on the data without doing any pre processing

clf = DecisionTreeClassifier()

clf.fit(x_train,y_train)

Out[]:

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,

                       max_features=None, max_leaf_nodes=None,

                       min_impurity_decrease=0.0, min_impurity_split=None,

                       min_samples_leaf=1, min_samples_split=2,

                       min_weight_fraction_leaf=0.0, presort=False,

                       random_state=None, splitter='best')

# create a dot_file which stores the tree structure

dot_data = export_graphviz(clf,feature_names = feature_name,rounded = True,filled = True)

# Draw graph

graph = pydotplus.graph_from_dot_data(dot_data)

graph.write_png("myTree.png")

# Show graph

Image(graph.create_png())

Let the tree above be understood:

The first value specifies a column and the selecting and splitting condition of the root node.

The second value gives a number of observations in the node value within the square brackets of gini impurities for the selected node samples, i.e. in the above figure, 8 observations are of class 1, 38, clase 2, 468 of class 3 and so on, which means that the node value is present at that time in the square brackets.

In[]:

clf.score(x_train,y_train)

Out[]:

1.0

In []:

py_pred = clf.predict(x_test)

In[]:

# accuracy of our classification tree

clf.score(x_test,y_test)

Out[]:

0.5791666666666667

No pre-processing of our data and no hyperparameter tweaking have now been carried out.

Let's all do that and see what increases our score.

What are hyper parameters?

We can see below the decision tree classifier algorithm takes all those parameters which are also known as hyperparameters.

Decision Tree In Python

Let's look at the parameters most important (as per sklearn documentation)



ADVERTISEMENT
ADVERTISEMENT