# OPTICS Clustering Explanation

## What is clustering?

**Clustering **is an **unsupervised learning** method which works on unlabeled data. It helps in grouping the data into different clusters. Objects with similar properties are** grouped in one cluster** and thus have no similarity with other clusters. Different clustering methods exist, like **DBSCAN, OPTICS, CURE**, etc.

## What is OPTICS Clustering?

**OPTICS Clustering **is referred to as **Ordering Points to Identify Cluster Structure**. It is a **density-based clustering unsupervised learning** algorithm developed after the **DBSCAN (Density-based Spatial Clustering of Applications with Noise) algorithm**. The OPTICS clustering algorithm expands on the DBSCAN clustering concepts by two more terms.

The **OPTICS clustering **is **similar **to the **DBSCAN clustering **algorithm but can **extract clusters of various densities and forms. **It helps locate clusters with various densities in big, high-dimensional datasets.

As the name suggests, the **OPTICS clustering** algorithm's main objective is **to extract the clustering structure of the dataset by locating the densely related points**. An ordered list of points is created by this algorithm, namely the** reachable plot**, used to construct a **density-based data representation**. The **reachability distance**, a measurement of how simple it is to reach from one point to another, is connected to each entry in the list. The points with similar reachability distances are grouped in the same cluster.

## Approach to the OPTICS Clustering Algorithm

**Step 1:**The first step towards clustering using OPTICS is to create a**density threshold parameter, Eps**, used to regulate the minimum density of the clusters.**Step 2:**The next step is to calculate**the distance**of each point in the dataset to its**k-nearest neighbors**.**Step 3:**The calculation of the**reachability point**is the next step. Determine the**reachability distance**of each location in the dataset based on the**density of its neighbors**, starting at any point.**Step 4:**Create the**reachability plot**after ordering the points according to their reachability points.**Step 5:**Using the reachability plot, identify clusters by combining points that are close to one another and have a similar range of reachability.

The OPTICS clustering algorithm is implemented using the** sklearn **package of Python. The **sklearn **library provides a class **sklearn.cluster.OPTICS**. It requires several parameters, including a **reachability distance cutoff, a minimum density threshold (Eps), and the number of nearest neighbours** to consider.

## Working of OPTICS

The **OPTICS clustering** does not divide the data into clusters physically. It uses the visualisation of reachability distance to cluster the data. These are a few concepts that are an add-on of the **DBSCAN clustering**:

**Core Distance:**A given**point's smallest radius**must be designated as a core point, referred to as core distance. The supplied point's Core Distance is undefined if it is not a Core point.**Reachability Distance:**The Reachability distance can be referred to as the**maximum of the Core distance of a point**, say p and the**Euclidean distance between the point p and some other point**, say q. The point q must be a Core Point to determine the Reachability Distance.

Even though the **MinPts parameter** is used in these computations, the theory is that it would have little effect because all distances would scale nearly at the same pace.

Firstly, the **core distance** is calculated on all data points in the set. Then, the** reachability distance **is updated after **observing the complete data set and processing the data points**. The points we process are set in order, and the reachability distance is updated simultaneously. Further, those points are decided that have **minimum reachability distance**. Thus, in this way, this algorithm forms** clusters** and keeps them to maintain order. Then, the labels of the actual cluster are extracted from the plot. We can do this by searching **“valleys” **in the plot using** maximum and minimum**.

## Detecting Outliers using OPTICS

The OPTICS clustering algorithm is used for **detecting outliers**. An outlier is a **data point that does not fit with the normal distribution **of the dataset. The OPTICS algorithm provides an **extension **for detecting the outliers, namely **OPTICS-OF**. The **OF** in OPTICS OF stands for **Outlier Factor**. It gives an **outlier score** to every point by comparing them with the closest neighbours instead of the entire cluster. Its **“local”** principle makes it a unique outlier detection process.

The Outlier Factor creates a new measure, **“Local Reachability Density”**. It is the **opposite of the average reachability of the MinPts-neighbors** about the point you are calculating.

After calculating the** local reachability point** for each point, now the **Outlier Factor** is calculated. It can be calculated by taking the **average of the ratios of the MinPts-neighbors to a specific point**.

The **“local”** element of the **Outlier factor** of the OPTICS separates it from the other outlier detection methods. In addition to providing a binary value, it can also assign a relative **outlier score**.

## Benefits of OPTICS

- It gives the clustering structure of the data set.
- It forms the reachability plot.
- There is no requirement to fix the number of clusters in advance.
- It can identify hierarchical structures and handle clusters of various densities and forms.

## Conclusion

OPTICS Clustering is referred to as Ordering Points To Identify Cluster Structure. It is a density-based clustering unsupervised learning algorithm developed after the DBSCAN (Density-based Spatial Clustering of Applications with Noise) algorithm. It needs more memory to find the next point closest to the reachability distance. It can identify hierarchical structures and handle clusters of various densities and forms. A reachability distance plot generated by OPTICS can be used to extract clusters at various granularities. In the reachability distance plot, small clusters surrounded by noise points may be blended with those clusters, making it harder for OPTICS to distinguish them. The OPTICS has a higher runtime complexity as it uses a priority queue to maintain reachability distances.