N2 in Python
N2 is known as Nearest Neighbor Algorithm, because it contains 2 N's (N-Nearest, N-Neighbor). This is a library in the python build using C++ and Python. Before N2 was made, people used libraries like Annoy and NMSLIB. Both of these have their good and bad points in performance and usage. Therefore, N2 is developed to maintain its strengths and behave as support for other libraries.
Python is the language for newly evolving technologies and has become one of the most popular computer languages in the world. Python is used in everything, from simple algorithm projects to high-level machine-level algorithms. Many web developers and programmers use Python. Python has become the most recommended language nowadays. Python is a general-purpose language that uses a wide range of applications. So N2 is also one of its best libraries with many features and functions.
Features of N2
- This is a lightweight library that can run efficiently even when we use large datasets.
- Operations like memory usage, index build time, and search speed are highly performed without any issues.
- This library supports multi-core central processing units (CPU) used in index building.
- To process very large index files without any issues, this library supports a feature called mmap by default.
- This library also supports Go bindings.
Distance Metrics:
Metrics | Definitions | d(p,q) |
angular | 1-cosθ | 1 - {sum(p i · q i) / sqrt(sum(p i · p i) · sum(q i · q i))} |
L2 | Squared L2 | sum{(p i - q i) 2} |
dot | Dot product | Sum(p i · q i) |
There are three distance metrics in N2. Where in angular and L2, the distance (d) is defined as the vectors being closer with "the smaller d is," and for the dot, the closer vectors are taken using "by larger d is." This dot makes users interpret the d value that is returned from the Hnsw search function as a dot product value.
For angular, we saw the definition as 1-cosθ, and to get the values of d, we used the equation mentioned above in the table. For L2, definitions are given as dot products. We provide the definition as squared L2, and the value of d is measured using the equation mentioned in the above table. Finally, the dot and its definitions are given as dot product, and the importance of d is calculated using the abovementioned equation.
Installation of N2
The Master branch is always the latest version of N2, and next, we use dev to develop the installed branch.
We install the N2 library using the following command:
$ pip install n2
By running this command, we got the n2 library installed in the IDE.
Note: For mac OS, we must have gcc preinstalled along with brew.
You can install and use this library in Python and also in C++.
Now let us have a look at a piece of code where we will see how and where to use N2
import NumPy as np
from n2 import HnswIndex
N, dim = 10240, 20
samples = np.arange(N * dim).reshape(N, dim)
index = HnswIndex(dim)
for sample in samples:
index.add_data(sample)
index.build(m=5, n_threads=4)
print(index.search_by_id(0, 10))
# [0,1,2,3,4,5,6,7,8,9]
In the above code, we first import Numpy as the code is for index search. And we initialized the values for N and dim as 10240 and 20 as well. The following line is used for setting the shape a d appearance of the index search. In this code, we saw where to use the N2 for any required question. After inserting N2, we can continue and complete the code by attaching the driver code and running it.
The new version of the nearest neighbor library was released on the 16th of October 2020.
Conclusion
Through this, we came to know that N2 is called the nearest neighbor library. One of the most used libraries In Python. We also saw how to install it on various operating systems. And the pre-requirements needed for installation. And saw a piece of sample code where we can see the insertion of N2 in any required code. We also looked at the features of N2 and how it differed from other libraries. Finally, we looked into the central concept of N2, the distance metrics which make N2 different from any additional libraries.