Project Name: Movie recommendation system

A recommendation provides customers with relevant information related to their searches. Before the recommendation system, the most common method of purchasing was to rely on the advice of friends. However, based on your search history, viewing history, or purchase history, Google now knows what news you'll read, and YouTube knows what kinds of videos you'll watch.

A recommendation system aids a firm in gaining loyal clients and establishing confidence by providing them with the items and services for which they come to your website. Today's recommendation systems are so sophisticated that they can manage even new customers who are visiting the site for the first time. They can also recommend things that are currently trending or highly rated. For this project, you can use content-based filtering.

Content-based filtering

The algorithm suggests a product that is like those that were previously viewed. To put it another way, we're trying to locate items that seem alike in this algorithm. If a person enjoys watching Sachine Tendulkar's shots, he might also enjoy watching Ricky Ponting's shots because the two videos have comparable tags and categories. Only the material appears to be identical, and it does not place a greater emphasis on the viewer. Only the product with the greatest score based on previous preferences is recommended.

Dataset

Link: - https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata?select=tmdb_5000_movies.csv .

Tech stack

Python
Pandas
Streamlit
NLTK
Pickle
Requests

Data processing

We need to select the features which play key role in recommendation. The data selected must be analyzed and preprocessed. We are not going to use all the feature columns. We will only select those that will play a major part in recommendations.

Stemming

Porter Stemming is used to perform stemming operations on a tag's column. Python's nltk package is used for this.
Stemming is the process of stripping a word down to its root, or lemma, which attaches to suffixes, prefixes, or the roots of other words. For instance, a stemming algorithm changes the phrases "To the root word "chocolate," "chocolates," "chocolatey," and "Choco," and to the stem "retrieve," "retrieval," "retrieved," and "retrieves," respectively. "COSINE

Cosine similarity

Cosine Similarity is a machine learning technique which measures the similarity between two vectors. By applying the cosine operation to the angles between the vectors, we can perform this cosine similarity.
It is majorly used to find out similarity and classify text information

Creating vectors

Based on the tag’s column, create vectors for corresponding movies, and then use cosine-similarity to compute the distance. Regardless of size, cosine similarity is a statistic for assessing how similar papers are. It determines the cosine of the three-dimensional angle created by two vectors projected side by side. Two comparable texts that are separated by the Euclidean distance because of the size of the document are likely to be oriented closer to one another because of the cosine similarity. Smaller the angle higher will be the similarity.

Source code of the program

import numpy as np
import pandas as pd
import ast
movies = pd.read_csv('tmdb_5000_movies.csv')
credits = pd.read_csv('tmdb_5000_credits.csv')
movies = movies.merge(credits,on='title')
movies = movies[['movie_id','title','genres','overview','keywords','cast','crew']]
movies.head()
movies.isnull().sum()
movies = movies.dropna()
movies.duplicated().sum()
def convert(obj):
L= []
for i in ast.literal_eval(obj):
L.append(i['name'])
return L
movies['genres'] = movies['genres']. apply(convert)
movies['keywords'] = movies['keywords']. apply(convert)
def convert3(obj):
iteration = 0
L= []
for i in ast.literal_eval(obj):
if (iteration! =3):
L.append(i['name'])
iteration = iteration+1
else:
break
return L
movies['cast'] = movies['cast']. apply(convert3)
movies.head()
def extract_director(obj):
L= []
for i in ast.literal_eval(obj):
if(i['job’] = ='Director'):
L.append(i['name'])
break
return L
movies['crew'] = movies['crew']. apply(extract_director)
movies.head()
movies['overview'] = movies['overview'].apply(lambda x:x.split())
movies.head()
movies['genres'] = movies['genres'].apply(lambda x:[i.replace(" ","")for i in x])
movies['keywords'] = movies['keywords'].apply(lambda x:[i.replace(" ","")for i in x])
movies['crew'] = movies['crew'].apply(lambda x:[i.replace(" ","")for i in x])
movies['cast'] = movies['cast'].apply(lambda x:[i.replace(" ","")for i in x])


movies.head()
movies['tags'] = movies['overview'] + movies['genres'] + movies['keywords'] + movies['cast'] + movies['crew']
movies.head()
New_Data = movies[['movie_id','title','tags']]
New_Data.head()
New_Data['tags'] = New_Data['tags'].apply(lambda x:" ".join(x))
New_Data.head()
New_Data['tags'] = New_Data['tags'].apply(lambda x:x.lower())
from sklearn.feature_extraction.text import CountVectorizer                                                                                                         
cv = CountVectorizer(max_features=5000,stop_words='english')
cv.fit_transform(New_Data['tags']).toarray().shape
vectors = cv.fit_transform(New_Data['tags']).toarray()
vectors
cv.get_feature_names()




import nltk
from nltk.stem.porter import PorterStemmer
ps = PorterStemmer()
def stem(text):
y = []


for i in text.split():
y.append(ps.stem(i))
return " ".join(y)
New_Data['tags'][0]
New_Data['tags'] = New_Data['tags'].apply(stem)
from sklearn.metrics.pairwise import cosine_similarity
similarity = cosine_similarity(vectors)
similarity[1]
sorted(list(enumerate(similarity[0])),reverse=True,key=lambda x:x[1])[1:10]
def recommend(movie):
movie_index = New_Data[New_Data['title'] == movie].index[0]
distances = similarity[movie_index]
movies_list = sorted(list(enumerate(distances)),reverse=True,key=lambda x:x[1])[1:6]


for i in movies_list:
print(New_Data.iloc[i[0]].title)
recommend('Avatar')

Output

Artificial intelligence mini projects with source code

Python Tutorial

Python Conditional Statements

Python Loops

Python Arrays

Python Strings

Python Built-in Data Structure

Python Functions

Python File Handling

Python Exception Handling

Python OOPs Concept

Python Iterators

Python Generators

Python Decorators

Python Functions and Methods

Python Modules

Python MySQL

Python MongoDB

Python SQLite

Python Data Structure Implementation

Python Advance Topics

Python 2

Python 3

How to

Sorting

Programs

Questions

Differences

Python Kivy

Python Tkinter

Python PyQt5