Project Name: Movie recommendation system
A recommendation provides customers with relevant information related to their searches. Before the recommendation system, the most common method of purchasing was to rely on the advice of friends. However, based on your search history, viewing history, or purchase history, Google now knows what news you'll read, and YouTube knows what kinds of videos you'll watch.
A recommendation system aids a firm in gaining loyal clients and establishing confidence by providing them with the items and services for which they come to your website. Today's recommendation systems are so sophisticated that they can manage even new customers who are visiting the site for the first time. They can also recommend things that are currently trending or highly rated. For this project, you can use content-based filtering.
The algorithm suggests a product that is like those that were previously viewed. To put it another way, we're trying to locate items that seem alike in this algorithm. If a person enjoys watching Sachine Tendulkar's shots, he might also enjoy watching Ricky Ponting's shots because the two videos have comparable tags and categories. Only the material appears to be identical, and it does not place a greater emphasis on the viewer. Only the product with the greatest score based on previous preferences is recommended.
We need to select the features which play key role in recommendation. The data selected must be analyzed and preprocessed. We are not going to use all the feature columns. We will only select those that will play a major part in recommendations.
- Porter Stemming is used to perform stemming operations on a tag's column. Python's nltk package is used for this.
- Stemming is the process of stripping a word down to its root, or lemma, which attaches to suffixes, prefixes, or the roots of other words. For instance, a stemming algorithm changes the phrases "To the root word "chocolate," "chocolates," "chocolatey," and "Choco," and to the stem "retrieve," "retrieval," "retrieved," and "retrieves," respectively. "COSINE
- Cosine Similarity is a machine learning technique which measures the similarity between two vectors. By applying the cosine operation to the angles between the vectors, we can perform this cosine similarity.
- It is majorly used to find out similarity and classify text information
Based on the tag’s column, create vectors for corresponding movies, and then use cosine-similarity to compute the distance. Regardless of size, cosine similarity is a statistic for assessing how similar papers are. It determines the cosine of the three-dimensional angle created by two vectors projected side by side. Two comparable texts that are separated by the Euclidean distance because of the size of the document are likely to be oriented closer to one another because of the cosine similarity. Smaller the angle higher will be the similarity.
Source code of the program
import numpy as np import pandas as pd import ast movies = pd.read_csv('tmdb_5000_movies.csv') credits = pd.read_csv('tmdb_5000_credits.csv') movies = movies.merge(credits,on='title') movies = movies[['movie_id','title','genres','overview','keywords','cast','crew']] movies.head() movies.isnull().sum() movies = movies.dropna() movies.duplicated().sum() def convert(obj): L=  for i in ast.literal_eval(obj): L.append(i['name']) return L movies['genres'] = movies['genres']. apply(convert) movies['keywords'] = movies['keywords']. apply(convert) def convert3(obj): iteration = 0 L=  for i in ast.literal_eval(obj): if (iteration! =3): L.append(i['name']) iteration = iteration+1 else: break return L movies['cast'] = movies['cast']. apply(convert3) movies.head() def extract_director(obj): L=  for i in ast.literal_eval(obj): if(i['job’] = ='Director'): L.append(i['name']) break return L movies['crew'] = movies['crew']. apply(extract_director) movies.head() movies['overview'] = movies['overview'].apply(lambda x:x.split()) movies.head() movies['genres'] = movies['genres'].apply(lambda x:[i.replace(" ","")for i in x]) movies['keywords'] = movies['keywords'].apply(lambda x:[i.replace(" ","")for i in x]) movies['crew'] = movies['crew'].apply(lambda x:[i.replace(" ","")for i in x]) movies['cast'] = movies['cast'].apply(lambda x:[i.replace(" ","")for i in x]) movies.head() movies['tags'] = movies['overview'] + movies['genres'] + movies['keywords'] + movies['cast'] + movies['crew'] movies.head() New_Data = movies[['movie_id','title','tags']] New_Data.head() New_Data['tags'] = New_Data['tags'].apply(lambda x:" ".join(x)) New_Data.head() New_Data['tags'] = New_Data['tags'].apply(lambda x:x.lower()) from sklearn.feature_extraction.text import CountVectorizer cv = CountVectorizer(max_features=5000,stop_words='english') cv.fit_transform(New_Data['tags']).toarray().shape vectors = cv.fit_transform(New_Data['tags']).toarray() vectors cv.get_feature_names() import nltk from nltk.stem.porter import PorterStemmer ps = PorterStemmer() def stem(text): y =  for i in text.split(): y.append(ps.stem(i)) return " ".join(y) New_Data['tags'] New_Data['tags'] = New_Data['tags'].apply(stem) from sklearn.metrics.pairwise import cosine_similarity similarity = cosine_similarity(vectors) similarity sorted(list(enumerate(similarity)),reverse=True,key=lambda x:x)[1:10] def recommend(movie): movie_index = New_Data[New_Data['title'] == movie].index distances = similarity[movie_index] movies_list = sorted(list(enumerate(distances)),reverse=True,key=lambda x:x)[1:6] for i in movies_list: print(New_Data.iloc[i].title) recommend('Avatar')