Movie Recommendation System

by Keanu Sida

Context

Online streaming platforms like Netflix have plenty of movies in their repository and if we can build a Recommendation System to recommend relevant movies to users, based on their historical interactions, this would improve customer satisfaction thereby improving the revenue of the platform. The techniques employed here can be employed for any item for which a recommendation system is appropriate.


Objective

This project features three kinds of recommendation systems:

  1. Knowledge/Rank based recommendation system
  2. Similarity-Based Collaborative filtering
  3. Matrix Factorization Based Collaborative Filtering

I used the ratings dataset, which can be downloaded as a .csv file here.


Dataset

The ratings dataset contains the following attributes:

Installing surprise library

# Install surprise library
!pip install surprise
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting surprise
  Downloading surprise-0.1-py2.py3-none-any.whl (1.8 kB)
Collecting scikit-surprise
  Downloading scikit-surprise-1.1.1.tar.gz (11.8 MB)
ent already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-surprise->surprise) (1.2.0)
Requirement already satisfied: numpy>=1.11.2 in /usr/local/lib/python3.7/dist-packages (from scikit-surprise->surprise) (1.21.6)
Requirement already satisfied: scipy>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-surprise->surprise) (1.7.3)
Requirement already satisfied: six>=1.10.0 in /usr/local/lib/python3.7/dist-packages (from scikit-surprise->surprise) (1.15.0)
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... e=scikit_surprise-1.1.1-cp37-cp37m-linux_x86_64.whl size=1633970 sha256=46c3bd64da03464dab9be98362f87962a4c28fa86ad1bfe137814015d8759182
  Stored in directory: /root/.cache/pip/wheels/76/44/74/b498c42be47b2406bd27994e16c5188e337c657025ab400c1c
Successfully built scikit-surprise
Installing collected packages: scikit-surprise, surprise
Successfully installed scikit-surprise-1.1.1 surprise-0.1

Importing the necessary libraries and overview of the dataset

# Used to ignore the warning given as output of the code
import warnings                                 
warnings.filterwarnings('ignore')

# Basic libraries of python for numeric and dataframe computations
import numpy as np                              
import pandas as pd

# Basic library for data visualization
import matplotlib.pyplot as plt     

# Slightly advanced library for data visualization            
import seaborn as sns                           

# A dictionary output that does not raise a key error
from collections import defaultdict             

# A performance metrics in surprise
from surprise import accuracy

# Class is used to parse a file containing ratings, data should be in structure - user ; item ; rating
from surprise.reader import Reader

# Class for loading datasets
from surprise.dataset import Dataset

# For model tuning model hyper-parameters
from surprise.model_selection import GridSearchCV

# For splitting the rating data in train and test dataset
from surprise.model_selection import train_test_split

# For implementing similarity based recommendation system
from surprise.prediction_algorithms.knns import KNNBasic

# For implementing matrix factorization based recommendation system
from surprise.prediction_algorithms.matrix_factorization import SVD

# For implementing cross validation
from surprise.model_selection import KFold

Loading the data

# Import the dataset
rating = pd.read_csv('./ratings.csv') 

Let's check the info of the data

rating.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100004 entries, 0 to 100003
Data columns (total 4 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   userId     100004 non-null  int64  
 1   movieId    100004 non-null  int64  
 2   rating     100004 non-null  float64
 3   timestamp  100004 non-null  int64  
dtypes: float64(1), int64(3)
memory usage: 3.1 MB
# Dropping timestamp column
rating = rating.drop(['timestamp'], axis=1)

Exploring the Dataset

Let's explore the dataset and answer some basic data-related questions:

What do the top 5 rows of the data set look like?

rating.head()
userId movieId rating
0 1 31 2.5
1 1 1029 3.0
2 1 1061 3.0
3 1 1129 2.0
4 1 1172 4.0

Describing the distribution of ratings:

plt.figure(figsize = (12, 4))

sns.countplot(x = "rating", data = rating)

plt.tick_params(labelsize = 10)
plt.title("Distribution of Ratings ", fontsize = 10)
plt.xlabel("Ratings", fontsize = 10)
plt.ylabel("Number of Ratings", fontsize = 10)
plt.show()

Rating '4.0' has the highest count of ratings (>30k). Rating '3.0' is second with 20K+ and Rating '5.0' is third in count of ratings with around 15K.

The ratings are biased towards these 3 numbers significantly more than others.

Total number of unique users and unique movies:

# Finding number of unique users

rating['userId'].nunique()
671

There are 671 unique users in the data set.

# Finding number of unique movies

rating['movieId'].nunique()
9066

There are 9066 unique movies in the data set.

Movies with which the same user interacted more than once:

rating.groupby(['userId', 'movieId']).count()
rating
userId movieId
1 31 1
1029 1
1061 1
1129 1
1172 1
... ... ...
671 6268 1
6269 1
6365 1
6385 1
6565 1

100004 rows × 1 columns

rating.groupby(['userId', 'movieId']).count()['rating'].sum()
100004

The sum of ratings is equal to the total number of ratings. This implies that there is only one interaction between a pair of items and a user.

The most interacted-with movie in the dataset:


rating['movieId'].value_counts()
356       341
296       324
318       311
593       304
260       291
         ... 
98604       1
103659      1
104419      1
115927      1
6425        1
Name: movieId, Length: 9066, dtype: int64

The movie with the ID 356 is the most interacted-with movie in the dataset.

# Plotting distributions of ratings for 341 interactions with movieid 356 
plt.figure(figsize=(7,7))

rating[rating['movieId'] == 356]['rating'].value_counts().plot(kind='bar')

plt.xlabel('Rating')

plt.ylabel('Count')

plt.show()

This movie appears to be popular in a positive way, as a relatively high proportion of its ratings are 4.0 or 5.0 relative to the average of the whole data set.

Which users have the highest interactivity:


rating['userId'].value_counts()
547    2391
564    1868
624    1735
15     1700
73     1610
       ... 
296      20
289      20
249      20
221      20
1        20
Name: userId, Length: 671, dtype: int64

The user with the ID 547 interacted the most with movies in the dataset.

Distribution of the user-movie interactions:

# Finding user-movie interactions distribution
count_interactions = rating.groupby('userId').count()['movieId']
count_interactions
userId
1       20
2       76
3       51
4      204
5      100
      ... 
667     68
668     20
669     37
670     31
671    115
Name: movieId, Length: 671, dtype: int64
# Plotting user-movie interactions distribution

plt.figure(figsize=(15,7))

sns.histplot(count_interactions)

plt.xlabel('Number of Interactions by Users')

plt.show()

The distribution is highly skewed right. Very few users interacted with more than 50 movies.

Now that we've explored the data, let's start building some recommendation systems!

Creating A Rank-Based Recommendation System

Context

Rank-based recommendation systems provide recommendations based on the most popular items. This kind of recommendation system is useful when we have cold start problems. Cold start refers to the issue when we get a new user into the system and the machine is not able to recommend movies to the new user, as the user did not have any historical interactions in the dataset. In those cases, we can use rank-based recommendation system to recommend movies to the new user.

To build the rank-based recommendation system, we take average of all the ratings provided to each movie and then rank them based on their average rating.



# Calculating average ratings
average_rating = rating.groupby('movieId').mean()['rating']

# Calculating the count of ratings
count_rating = rating.groupby('movieId').count()['rating']

# Making a dataframe with the count and average of ratings
final_rating = pd.DataFrame({'avg_rating':average_rating, 'rating_count':count_rating})
final_rating.head()
avg_rating rating_count
movieId
1 3.872470 247
2 3.401869 107
3 3.161017 59
4 2.384615 13
5 3.267857 56

Now, let's create a function to find the top n movies for a recommendation based on the average ratings of movies. We can also add a threshold for a minimum number of interactions for a movie to be considered for recommendation.

def top_n_movies(data, n, min_interaction=100):
    
    #Finding movies with minimum number of interactions
    recommendations = data[data['rating_count'] >= min_interaction]
    
    #Sorting values w.r.t average rating 
    recommendations = recommendations.sort_values(by='avg_rating', ascending=False)
    
    return recommendations.index[:n]

We can use this function with different n's and minimum interactions to get movies to recommend

Recommending top 5 movies with 50 minimum interactions based on popularity


list(top_n_movies(final_rating, 5, 50))
[858, 318, 969, 913, 1221]

Recommending top 5 movies with 100 minimum interactions based on popularity


list(top_n_movies(final_rating, 5, 100))
[858, 318, 1221, 50, 527]

Recommending top 5 movies with 200 minimum interactions based on popularity


list(top_n_movies(final_rating, 5, 200))
[858, 318, 50, 527, 608]

Now let's assume we've got some additional data to work with. We can utilize Collaborative Filtering Based Recommendation Systems to better understand the needs of the user and thereby improve UX.

User-Based Collaborative Filtering Recommendation System

In this type of recommendation system, we do not need any information about the users or items. We only need user item interaction data to build a collaborative recommendation system. For example -

  1. Ratings provided by users. For example - ratings of books on goodread, movie ratings on imdb etc
  2. Likes of users on different facebook posts, likes on youtube videos
  3. Use/buying of a product by users. For example - buying different items on e-commerce sites
  4. Reading of articles by readers on various blogs

Types of Collaborative Filtering

Building a baseline user-user similarity based recommendation system

We'll load the rating dataset, which is a pandas DataFrame, into a different format called surprise.dataset.DatasetAutoFolds, which is required by this library. To do this, we will be using the classes Reader and Dataset. Finally, we'll split the data into train and test sets.

Making the dataset into surprise dataset and splitting it into train and test set

# Instantiating Reader scale with expected rating scale
reader = Reader(rating_scale=(0, 5))

# Loading the rating dataset
data = Dataset.load_from_df(rating[['userId', 'movieId', 'rating']], reader)

# Splitting the data into train and test dataset
trainset, testset = train_test_split(data, test_size=0.2, random_state=42)

Build the first baseline similarity based recommendation system using cosine similarity and KNN


sim_options = {'name': 'cosine',
               'user_based': True}

# Defining Nearest neighbour algorithm
algo_knn_user = KNNBasic(sim_options=sim_options,verbose=False)

# Train the algorithm on the trainset or fitting the model on train dataset 
algo_knn_user.fit(trainset)

# Predict ratings for the testset
predictions = algo_knn_user.test(testset)

# Then compute RMSE
accuracy.rmse(predictions)
RMSE: 0.9925
0.9924509041520163

RMSE for baseline user based collaborative filtering recommendation system

The RMSE for the baseline system is 0.9925.

Predicted rating for an user with a specific user (e.g. userId=4: movieId=10 or movieId=3)

Let's us now predict rating for an user with userId=4 and for movieId=10

algo_knn_user.predict(4, 10, r_ui=4, verbose=True)
user: 4          item: 10         r_ui = 4.00   est = 3.62   {'actual_k': 40, 'was_impossible': False}
Prediction(uid=4, iid=10, r_ui=4, est=3.6244912065910952, details={'actual_k': 40, 'was_impossible': False})

Movie 10 has an estimated rating of 3.62 for user 4.

Let's predict the rating for the same userId=4 but for a movie which this user has not interacted before i.e. movieId=3

algo_knn_user.predict(4, 3, verbose=True)
user: 4          item: 3          r_ui = None   est = 3.20   {'actual_k': 40, 'was_impossible': False}
Prediction(uid=4, iid=3, r_ui=None, est=3.202703552548654, details={'actual_k': 40, 'was_impossible': False})

Movie 3 has an estimated rating of 3.20 for user 4.

Improving user-user similarity based recommendation system by tuning its hyper-parameters:

Below we will be tuning hyper-parmeters for the KNNBasic algorithms. Let's try to understand different hyperparameters of KNNBasic algorithm -

For more details please refer the official documentation https://surprise.readthedocs.io/en/stable/knn_inspired.html

Performing hyperparameter tuning for the baseline user based collaborative filtering recommendation system and finding the RMSE for tuned user based collaborative filtering recommendation system:


# Setting up parameter grid to tune the hyperparameters
param_grid = {'k': [20, 30, 40], 'min_k': [3, 6, 9],
              'sim_options': {'name': ['msd', 'cosine'],
                              'user_based': [True]}
              }

# Performing 3-fold cross validation to tune the hyperparameters
grid_obj = GridSearchCV(KNNBasic, param_grid, measures=['rmse', 'mae'], cv=3, n_jobs=-1)

# Fitting the data
grid_obj.fit(data)

# Best RMSE score
print(grid_obj.best_score['rmse'])

# Combination of parameters that gave the best RMSE score
print(grid_obj.best_params['rmse'])
0.9652553929644568
{'k': 20, 'min_k': 3, 'sim_options': {'name': 'msd', 'user_based': True}}

Once the grid search is complete, we can get the optimal values for each of those hyperparameters as shown above.

Below we are analysing evaluation metrics - RMSE and MAE at each and every split to analyze the impact of each value of hyperparameters

results_df = pd.DataFrame.from_dict(grid_obj.cv_results)
results_df.head()
split0_test_rmse split1_test_rmse split2_test_rmse mean_test_rmse std_test_rmse rank_test_rmse split0_test_mae split1_test_mae split2_test_mae mean_test_mae std_test_mae rank_test_mae mean_fit_time std_fit_time mean_test_time std_test_time params param_k param_min_k param_sim_options
0 0.965568 0.961561 0.968637 0.965255 0.002897 1 0.744661 0.738655 0.743292 0.742203 0.002570 1 0.129260 0.006148 3.364002 0.037595 {'k': 20, 'min_k': 3, 'sim_options': {'name': ... 20 3 {'name': 'msd', 'user_based': True}
1 0.994451 0.992335 0.996891 0.994559 0.001861 14 0.770900 0.766768 0.769493 0.769054 0.001715 12 0.743921 0.021103 3.064117 0.138485 {'k': 20, 'min_k': 3, 'sim_options': {'name': ... 20 3 {'name': 'cosine', 'user_based': True}
2 0.970929 0.965168 0.971683 0.969260 0.002910 4 0.748282 0.741616 0.745940 0.745279 0.002761 3 0.113563 0.005318 3.041815 0.100373 {'k': 20, 'min_k': 6, 'sim_options': {'name': ... 20 6 {'name': 'msd', 'user_based': True}
3 0.998277 0.994462 0.998090 0.996943 0.001756 15 0.773384 0.768445 0.770557 0.770795 0.002023 15 0.665888 0.040078 3.042022 0.025506 {'k': 20, 'min_k': 6, 'sim_options': {'name': ... 20 6 {'name': 'cosine', 'user_based': True}
4 0.975982 0.970484 0.978132 0.974866 0.003220 7 0.752162 0.746114 0.751751 0.750009 0.002759 6 0.123360 0.019757 3.219795 0.086318 {'k': 20, 'min_k': 9, 'sim_options': {'name': ... 20 9 {'name': 'msd', 'user_based': True}

Now, let's build the final model by using tuned values of the hyperparameters, which we received by using grid search cross-validation.

# Using the optimal similarity measure for user-user based collaborative filtering
# Creating an instance of KNNBasic with optimal hyperparameter values
similarity_algo_optimized_user = KNNBasic(sim_options = sim_options, k=40, min_k=6,verbose=False)

# Training the algorithm on the trainset
similarity_algo_optimized_user.fit(trainset)

# Predicting ratings for the testset
predictions = similarity_algo_optimized_user.test(testset)

# Computing RMSE on testset
accuracy.rmse(predictions)
RMSE: 0.9908
0.9907613369496804

We can see from above that after tuning hyperparameters, RMSE for testset has reduced to 0.98 from 0.9925. Thus, hyperparameter tuning has slightly improved our model.

Predicted rating for an user with userId =4 and for movieId= 10 and movieId=3 using tuned user based collaborative filtering:

Let's now predict rating for an user with userId=4 and for movieId=10 with the optimized model


similarity_algo_optimized_user.predict(4,10, r_ui=4, verbose=True)
user: 4          item: 10         r_ui = 4.00   est = 3.62   {'actual_k': 40, 'was_impossible': False}
Prediction(uid=4, iid=10, r_ui=4, est=3.6244912065910952, details={'actual_k': 40, 'was_impossible': False})

The predicted rating for the optimized algorithm is still 3.62.

Below we are predicting rating for the same userId=4 but for a movie which this user has not interacted before i.e. movieId=3, by using the optimized model as shown below -


similarity_algo_optimized_user.predict(4,3, verbose=True)
user: 4          item: 3          r_ui = None   est = 3.20   {'actual_k': 40, 'was_impossible': False}
Prediction(uid=4, iid=3, r_ui=None, est=3.202703552548654, details={'actual_k': 40, 'was_impossible': False})

The predicted rating for the optimized algorithm is still 3.20.

Identifying similar users to a given user (nearest neighbors)

We can also find out the similar users to a given user or its nearest neighbors based on this KNNBasic algorithm. Below we are finding 5 most similar user to the userId=4 based on the msd distance metric

similarity_algo_optimized_user.get_neighbors(4, k=5)
[357, 220, 590, 491, 647]

Implementing the recommendation algorithm based on optimized KNNBasic model

Below we will be implementing a function where the input parameters are -

def get_recommendations(data, user_id, top_n, algo):

# Creating an empty list to store the recommended movie ids
recommendations = []

# Creating an user item interactions matrix 
user_item_interactions_matrix = data.pivot(index='userId', columns='movieId', values='rating')

# Extracting those movie ids which the user_id has not interacted yet
non_interacted_movies = user_item_interactions_matrix.loc[user_id][user_item_interactions_matrix.loc[user_id].isnull()].index.tolist()

# Looping through each of the movie id which user_id has not interacted yet
for item_id in non_interacted_movies:

# Predicting the ratings for those non interacted movie ids by this user
est = algo.predict(user_id, item_id).est

# Appending the predicted ratings
recommendations.append((item_id, est))

# Sorting the predicted ratings in descending order
recommendations.sort(key=lambda x: x[1], reverse=True)

return recommendations[:top_n] # returing top n highest predicted rating movies for this user

Predicted top 5 movies for userId=4 with similarity based recommendation system

recommendations = get_recommendations(rating,4,5,similarity_algo_optimized_user)

Predicting the top 5 movies for userId=4 with similarity based recommendation system:

recommendations
[(98491, 4.832340578646058),
 (116, 4.753206589295344),
 (6669, 4.748048450384675),
 (1221, 4.662571141751736),
 (1192, 4.65824768595177)]

Item based Collaborative Filtering Recommendation System

# Definfing similarity measure
sim_options = {'name': 'cosine',
               'user_based': False}

# Defining Nearest neighbour algorithm
algo_knn_item = KNNBasic(sim_options=sim_options, verbose=False)

# Train the algorithm on the trainset or fitting the model on train dataset 
algo_knn_item.fit(trainset)

# Predict ratings for the testset
predictions = algo_knn_item.test(testset)

# Then compute RMSE
accuracy.rmse(predictions)
RMSE: 1.0032
1.003221450633729

RMSE for baseline item based collaborative filtering recommendation system

The baseline item-based system has an RMSE of 1.0032.

Let's now predict rating for an user with userId=4 and for movieId=10.

Predicted rating for an user with userId =4 and for movieId= 10 and movieId=3:

algo_knn_item.predict(4,10, r_ui=4, verbose=True)
user: 4          item: 10         r_ui = 4.00   est = 4.37   {'actual_k': 40, 'was_impossible': False}
Prediction(uid=4, iid=10, r_ui=4, est=4.373794871885004, details={'actual_k': 40, 'was_impossible': False})

The system predicts a rating of 4.37 for user 4 for movie 10.

Let's predict the rating for the same userId=4 but for a movie which this user has not interacted before i.e. movieId=3

algo_knn_item.predict(4,3, verbose=True)
user: 4          item: 3          r_ui = None   est = 4.07   {'actual_k': 40, 'was_impossible': False}
Prediction(uid=4, iid=3, r_ui=None, est=4.071601862880049, details={'actual_k': 40, 'was_impossible': False})

The system predicts a rating of 4.07 for user 4 for movie 3.

Performing hyperparameter tuning for the baseline item based collaborative filtering recommendation system and find the RMSE for tuned item based collaborative filtering recommendation system?



# Setting up parameter grid to tune the hyperparameters
param_grid = {'k': [20, 30,40], 'min_k': [3,6,9],
              'sim_options': {'name': ['msd', 'cosine'],
                              'user_based': [False]}
              }

# Performing 3-fold cross validation to tune the hyperparameters
grid_obj = GridSearchCV(KNNBasic, param_grid, measures=['rmse', 'mae'], cv=3, n_jobs=-1)

# Fitting the data
grid_obj.fit(data)

# Best RMSE score
print(grid_obj.best_score['rmse'])

# Combination of parameters that gave the best RMSE score
print(grid_obj.best_params['rmse'])
0.9401320571134547
{'k': 40, 'min_k': 6, 'sim_options': {'name': 'msd', 'user_based': False}}

Once the grid search is complete, we can get the optimal values for each of those hyperparameters as shown above

Below we are analysing evaluation metrics - RMSE and MAE at each and every split to analyze the impact of each value of hyperparameters

results_df = pd.DataFrame.from_dict(grid_obj.cv_results)
results_df.head()
split0_test_rmse split1_test_rmse split2_test_rmse mean_test_rmse std_test_rmse rank_test_rmse split0_test_mae split1_test_mae split2_test_mae mean_test_mae std_test_mae rank_test_mae mean_fit_time std_fit_time mean_test_time std_test_time params param_k param_min_k param_sim_options
0 0.951277 0.950129 0.950116 0.950508 0.000544 8 0.734416 0.734226 0.733322 0.733988 0.000477 7 7.439870 0.545805 11.326026 0.512808 {'k': 20, 'min_k': 3, 'sim_options': {'name': ... 20 3 {'name': 'msd', 'user_based': False}
1 1.012004 1.016567 1.014096 1.014222 0.001865 17 0.789165 0.793400 0.791117 0.791227 0.001731 16 20.378437 1.080052 10.987197 0.362469 {'k': 20, 'min_k': 3, 'sim_options': {'name': ... 20 3 {'name': 'cosine', 'user_based': False}
2 0.951213 0.950136 0.950120 0.950490 0.000512 7 0.734455 0.734210 0.733590 0.734085 0.000364 8 6.380261 0.199738 11.128263 0.456450 {'k': 20, 'min_k': 6, 'sim_options': {'name': ... 20 6 {'name': 'msd', 'user_based': False}
3 1.011952 1.016582 1.014071 1.014202 0.001892 16 0.789239 0.793386 0.791373 0.791333 0.001693 17 19.012726 0.650604 11.196412 0.564934 {'k': 20, 'min_k': 6, 'sim_options': {'name': ... 20 6 {'name': 'cosine', 'user_based': False}
4 0.951610 0.950883 0.950264 0.950919 0.000550 9 0.734750 0.734794 0.733791 0.734445 0.000463 9 5.797267 0.316200 11.779861 0.483101 {'k': 20, 'min_k': 9, 'sim_options': {'name': ... 20 9 {'name': 'msd', 'user_based': False}

Now let's build the final model by using tuned values of the hyperparameters which we received by using grid search cross-validation.

# Creating an instance of KNNBasic with optimal hyperparameter values
similarity_algo_optimized_item = KNNBasic(sim_options={'name': 'msd', 'user_based': False}, k=30, min_k=6,verbose=False)

# Training the algorithm on the trainset
similarity_algo_optimized_item.fit(trainset)

# Predicting ratings for the testset
predictions = similarity_algo_optimized_item.test(testset)

# Computing RMSE on testset
accuracy.rmse(predictions)
RMSE: 0.9465
0.9465120620317036

The final model has a RMSE of 0.9465.

Predicted rating for an item with userId =4 and for movieId= 10 and movieId=3 using tuned item based collaborative filtering?

Let's now predict rating for an user with userId=4 and for movieId=10 with the optimized model as shown below

similarity_algo_optimized_item.predict(4,10, r_ui=4, verbose=True)
user: 4          item: 10         r_ui = 4.00   est = 4.30   {'actual_k': 30, 'was_impossible': False}
Prediction(uid=4, iid=10, r_ui=4, est=4.298279280483517, details={'actual_k': 30, 'was_impossible': False})

The predicted rating for movie 10 is 4.298.

Let's predict the rating for the same userId=4 but for a movie which this user has not interacted before i.e. movieId=3, by using the optimized model:

similarity_algo_optimized_item.predict(4, 3, verbose=True)
user: 4          item: 3          r_ui = None   est = 3.86   {'actual_k': 30, 'was_impossible': False}
Prediction(uid=4, iid=3, r_ui=None, est=3.859023126306401, details={'actual_k': 30, 'was_impossible': False})

The predicted rating for movie 3 is 3.859.

Identifying similar users to a given user (nearest neighbors)

We can also find out the similar users to a given user or its nearest neighbors based on this KNNBasic algorithm. Below we are finding 5 most similar user to the userId=4 based on the msd distance metric

similarity_algo_optimized_item.get_neighbors(4, k=5)
[77, 85, 115, 119, 127]

Predicted top 5 movies for userId=4 with similarity based recommendation system

recommendations = get_recommendations(rating, 4, 5, similarity_algo_optimized_item)

Predicting the top 5 movies for userId=4 with similarity based recommendation system:

recommendations
[(84, 5), (1040, 5), (2481, 5), (3515, 5), (4521, 5)]

Model-Based Collaborative Filtering (Matrix Factorization using SVD)

Model-based Collaborative Filtering is a personalized recommendation system, the recommendations are based on the past behavior of the user and it is not dependent on any additional information. We use latent features to find recommendations for each user.

Singular Value Decomposition (SVD)

SVD is used to compute the latent features from the user-item matrix. But SVD does not work when we miss values in the user-item matrix.

First we need to convert the movie-rating dataset into an user-item matrix. We have already done this above while computing cosine similarities.

SVD decomposes this above matrix into three separate matrices:

U-matrix

An n x k matrix, where:

Sigma-matrix

A k x k matrix, where:

V-transpose matrix

A k x n matrix, where:

Build a baseline matrix factorization recommendation system

# Using SVD matrix factorization
algo_svd = SVD()

# Training the algorithm on the trainset
algo_svd.fit(trainset)

# Predicting ratings for the testset
predictions = algo_svd.test(testset)

# Computing RMSE on the testset
accuracy.rmse(predictions)
RMSE: 0.9031
0.9031390885282595

RMSE for baseline SVD based collaborative filtering recommendation system:

The baseline SVD-based system has an RMSE of 0.9031.

Predicted rating for an user with userId =4 and for movieId= 10 and movieId=3:

Let's now predict rating for an user with userId=4 and for movieId=10

algo_svd.predict(4, 10, r_ui=4, verbose=True)
user: 4          item: 10         r_ui = 4.00   est = 3.92   {'was_impossible': False}
Prediction(uid=4, iid=10, r_ui=4, est=3.9176599589678984, details={'was_impossible': False})

The SVD system predicts a rating of 3.918 for movie 10 for user 4.

Let's predict the rating for the same userId=4 but for a movie which this user has not interacted before i.e. movieId=3:

algo_svd.predict(4, 3, verbose=True)
user: 4          item: 3          r_ui = None   est = 3.64   {'was_impossible': False}
Prediction(uid=4, iid=3, r_ui=None, est=3.6373797205620138, details={'was_impossible': False})

The SVD system predicts a rating of 3.637 for movie 3 for user 4.

Improving matrix factorization based recommendation system by tuning its hyper-parameters

In SVD, rating is predicted as -

ui = μ + bu + bi + qiTpu

If user u is unknown, then the bias bu and the factors pu are assumed to be zero. The same applies for item i with bi and qi.

To estimate all the unknown, we minimize the following regularized squared error:

rui ∈ Rtrain (ruiui)2 + λ(bi2+bu2+∥qi2+∥pu2)

The minimization is performed by a stochastic gradient descent. There are many hyperparameters to tune in this algorithm, you can find a full list of hyperparameters here.

Below we will be tuning only three hyperparameters -

Performing hyperparameter tuning for the baseline SVD based collaborative filtering recommendation system and finding the RMSE for tuned SVD based collaborative filtering recommendation system:

# Set the parameter space to tune
param_grid = {'n_epochs': [10, 20, 30], 'lr_all': [0.001, 0.005, 0.01],
              'reg_all': [0.2, 0.4, 0.6]}

# Performing 3-fold gridsearch cross validation
gs = GridSearchCV(SVD, param_grid, measures=['rmse', 'mae'], cv=3, n_jobs=-1)

# Fitting data
gs.fit(data)

# Best RMSE score
print(gs.best_score['rmse'])

# Combination of parameters that gave the best RMSE score
print(gs.best_params['rmse'])
0.8937463944379922
{'n_epochs': 30, 'lr_all': 0.01, 'reg_all': 0.2}

Once the grid search is complete, we can get the optimal values for each of those hyperparameters, as shown above.

Below we are analysing evaluation metrics - RMSE and MAE at each and every split to analyze the impact of each value of hyperparameters

results_df = pd.DataFrame.from_dict(gs.cv_results)
results_df.head()
split0_test_rmse split1_test_rmse split2_test_rmse mean_test_rmse std_test_rmse rank_test_rmse split0_test_mae split1_test_mae split2_test_mae mean_test_mae std_test_mae rank_test_mae mean_fit_time std_fit_time mean_test_time std_test_time params param_n_epochs param_lr_all param_reg_all
0 0.937088 0.941831 0.950693 0.943204 0.005638 25 0.734681 0.736846 0.742574 0.738034 0.003330 25 4.822517 0.412795 0.588699 0.047995 {'n_epochs': 10, 'lr_all': 0.001, 'reg_all': 0.2} 10 0.001 0.2
1 0.941132 0.946314 0.955085 0.947510 0.005759 26 0.739717 0.742279 0.747904 0.743300 0.003419 26 5.001963 0.058948 0.622447 0.061483 {'n_epochs': 10, 'lr_all': 0.001, 'reg_all': 0.4} 10 0.001 0.4
2 0.946071 0.951274 0.960601 0.952649 0.006011 27 0.744921 0.747856 0.753383 0.748720 0.003508 27 5.115259 0.133874 0.694518 0.083622 {'n_epochs': 10, 'lr_all': 0.001, 'reg_all': 0.6} 10 0.001 0.6
3 0.899909 0.907059 0.913570 0.906846 0.005579 10 0.698638 0.702193 0.706866 0.702566 0.003369 9 4.998776 0.122980 0.601300 0.095280 {'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.2} 10 0.005 0.2
4 0.907404 0.913606 0.920989 0.914000 0.005553 15 0.706287 0.709524 0.714526 0.710112 0.003389 15 4.929785 0.117247 0.628400 0.108293 {'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.4} 10 0.005 0.4

Now, we will the build final model by using tuned values of the hyperparameters, which we received using grid search cross-validation above.

# Building the optimized SVD model using optimal hyperparameter search
svd_algo_optimized = SVD(n_epochs=20, lr_all=0.01, reg_all=0.2)

# Training the algorithm on the trainset
svd_algo_optimized.fit(trainset)

# Predicting ratings for the testset
predictions = svd_algo_optimized.test(testset)

# Computing RMSE
accuracy.rmse(predictions)
RMSE: 0.8973
0.8972580357427976

Predicted rating for an user with userId =4 and for movieId= 10 and movieId=3 using SVD based collaborative filtering:

Let's us now predict rating for an user with userId=4 and for movieId=10 with the optimized model

svd_algo_optimized.predict(4, 10, r_ui=4, verbose=True)
user: 4          item: 10         r_ui = 4.00   est = 3.97   {'was_impossible': False}
Prediction(uid=4, iid=10, r_ui=4, est=3.9746300660681904, details={'was_impossible': False})

The predicted rating of movie 10 for user 4 using SVD collab filtering is 3.975.

Let's predict the rating for the same userId=4 but for a movie which this user has not interacted before i.e. movieId=3:

svd_algo_optimized.predict(4, 3, verbose=True)
user: 4          item: 3          r_ui = None   est = 3.65   {'was_impossible': False}
Prediction(uid=4, iid=3, r_ui=None, est=3.6494509461174185, details={'was_impossible': False})

The predicted rating of movie 3 for user 4 using SVD collab filtering is 3.649.

Predicting the top 5 movies for userId=4 with SVD based recommendation system:

get_recommendations(rating, 4, 5, svd_algo_optimized)
[(926, 4.938831611532595),
 (1192, 4.911663964684779),
 (1217, 4.876880839499725),
 (3035, 4.864153522050881),
 (232, 4.8568399745706605)]

Predicting ratings for already interacted movies:

Below we are comparing the rating predictions of users for those movies which has been already watched by an user. This will help us to understand how well are predictions are as compared to the actual ratings provided by users

def predict_already_interacted_ratings(data, user_id, algo):

# Creating an empty list to store the recommended movie ids
recommendations = []

# Creating an user item interactions matrix 
user_item_interactions_matrix = data.pivot(index='userId', columns='movieId', values='rating')

# Extracting those movie ids which the user_id has interacted already
interacted_movies = user_item_interactions_matrix.loc[user_id][user_item_interactions_matrix.loc[user_id].notnull()].index.tolist()

# Looping through each of the movie id which user_id has interacted already
for item_id in interacted_movies:

# Extracting actual ratings
actual_rating = user_item_interactions_matrix.loc[user_id, item_id]

# Predicting the ratings for those non interacted movie ids by this user
predicted_rating = algo.predict(user_id, item_id).est

# Appending the predicted ratings
recommendations.append((item_id, actual_rating, predicted_rating))

# Sorting the predicted ratings in descending order
recommendations.sort(key=lambda x: x[1], reverse=True)

return pd.DataFrame(recommendations, columns=['movieId', 'actual_rating', 'predicted_rating']) # returing top n highest predicted rating movies for this user

Here we are comparing the predicted ratings by similarity based recommendation system against actual ratings for userId=7

predicted_ratings_for_interacted_movies = predict_already_interacted_ratings(rating, 7, similarity_algo_optimized_item)
df = predicted_ratings_for_interacted_movies.melt(id_vars='movieId', value_vars=['actual_rating', 'predicted_rating'])
sns.displot(data=df, x='value', hue='variable', kde=True);

Below we are comparing the predicted ratings by matrix factorization based recommendation system against actual ratings for userId=7

predicted_ratings_for_interacted_movies = predict_already_interacted_ratings(rating, 7, svd_algo_optimized)
df = predicted_ratings_for_interacted_movies.melt(id_vars='movieId', value_vars=['actual_rating', 'predicted_rating'])
sns.displot(data=df, x='value', hue='variable', kde=True);
# Instantiating Reader scale with expected rating scale
reader = Reader(rating_scale=(0, 5))

# Loading the rating dataset
data = Dataset.load_from_df(rating[['userId', 'movieId', 'rating']], reader)

# Splitting the data into train and test dataset
trainset, testset = train_test_split(data, test_size=0.2, random_state=42)

Precision and Recall @ k

RMSE is not the only metric we can use here. We can also examine two fundamental measures, precision and recall. We also add a parameter k which is helpful in understanding problems with multiple rating outputs.

- It is the fraction of recommended items that are relevant in top k predictions. Value of k is the number of recommendations to be provided to the user. One can choose a variable number of recommendations to be given to a unique user.

- It is the fraction of relevant items that are recommended to the user in top k predictions.

Recall - It is the fraction of actually relevant items that are recommended to the user i.e. if out of 10 relevant movies, 6 are recommended to the user then recall is 0.60. Higher the value of recall better is the model. It is one of the metrics to do the performance assessment of classification models.

Precision - It is the fraction of recommended items that are relevant actually i.e. if out of 10 recommended items, 6 are found relevant by the user then precision is 0.60. The higher the value of precision better is the model. It is one of the metrics to do the performance assessment of classification models.

To know more about precision recall in recommendation systems, you can refer to the documentation or this Medium article.

Computing the precision and recall, for each of the 6 models, at k = 5 and 10:

# Function can be found on surprise documentation FAQs
def precision_recall_at_k(predictions, k=10, threshold=3.5):
"""Return precision and recall at k metrics for each user"""

# First map the predictions to each user.
user_est_true = defaultdict(list)
for uid, _, true_r, est, _ in predictions:
user_est_true[uid].append((est, true_r))

precisions = dict()
recalls = dict()
for uid, user_ratings in user_est_true.items():

# Sort user ratings by estimated value
user_ratings.sort(key=lambda x: x[0], reverse=True)

# Number of relevant items
n_rel = sum((true_r >= threshold) for (_, true_r) in user_ratings)

# Number of recommended items in top k
n_rec_k = sum((est >= threshold) for (est, _) in user_ratings[:k])

# Number of relevant and recommended items in top k
n_rel_and_rec_k = sum(((true_r >= threshold) and (est >= threshold))
for (est, true_r) in user_ratings[:k])

# Precision@K: Proportion of recommended items that are relevant
# When n_rec_k is 0, Precision is undefined. We here set it to 0.

precisions[uid] = n_rel_and_rec_k / n_rec_k if n_rec_k != 0 else 0

# Recall@K: Proportion of relevant items that are recommended
# When n_rel is 0, Recall is undefined. We here set it to 0.

recalls[uid] = n_rel_and_rec_k / n_rel if n_rel != 0 else 0

return precisions, recalls
# A basic cross-validation iterator.
kf = KFold(n_splits=5)

# Make list of k values
K = [5, 10]


# Make list of models
models = [algo_knn_user, similarity_algo_optimized_user,algo_knn_item,similarity_algo_optimized_item, algo_svd, svd_algo_optimized]

for k in K:
for model in models:
print('> k={}, model={}'.format(k,model.__class__.__name__))
        p = []
        r = []
        for trainset, testset in kf.split(data):
            model.fit(trainset)
            predictions = model.test(testset, verbose=False)
            precisions, recalls = precision_recall_at_k(predictions, k=k, threshold=3.5)

            # Precision and recall can then be averaged over all users
            p.append(sum(prec for prec in precisions.values()) / len(precisions))
            r.append(sum(rec for rec in recalls.values()) / len(recalls))
        
        print('-----> Precision: ', round(sum(p) / len(p), 3))
        print('-----> Recall: ', round(sum(r) / len(r), 3))
> k=5, model=KNNBasic
-----> Precision:  0.764
-----> Recall:  0.414
> k=5, model=KNNBasic
-----> Precision:  0.77
-----> Recall:  0.42
> k=5, model=KNNBasic
-----> Precision:  0.604
-----> Recall:  0.322
> k=5, model=KNNBasic
-----> Precision:  0.684
-----> Recall:  0.358
> k=5, model=SVD
-----> Precision:  0.756
-----> Recall:  0.386
> k=5, model=SVD
-----> Precision:  0.749
-----> Recall:  0.384
> k=10, model=KNNBasic
-----> Precision:  0.754
-----> Recall:  0.549
> k=10, model=KNNBasic
-----> Precision:  0.75
-----> Recall:  0.561
> k=10, model=KNNBasic
-----> Precision:  0.597
-----> Recall:  0.475
> k=10, model=KNNBasic
-----> Precision:  0.665
-----> Recall:  0.505
> k=10, model=SVD
-----> Precision:  0.738
-----> Recall:  0.522
> k=10, model=SVD
-----> Precision:  0.731
-----> Recall:  0.523

Discussion

Baseline user-based and item-based Collaborative Models have nearly the same RMSE values. Clearly, tuned Collaborative Filtering Models have performed better than baseline model and the user-user based tuned model is performing better and have rmse of 0.9908.

The Collaborative Models use the user-item-ratings data to find similarities and make predictions rather than just predicting a random rating based on the distribution of the data. This could a reason why the Collaborative filtering performed well.

Collaborative Filtering searches for neighbors based on similarity of item (example) preferences and recommend items that those neighbors interacted while Matrix factorization works by decomposing the user-item matrix into the product of two lower dimensionality rectangular matrices.

RMSE for Matrix Factorization is better than the Collaborative Filtering Model. Tuning SVD matrix factorization model is not improving the base line SVD much. Matrix Factorization has lower RMSE due to the reason that it assumes that both items and users are present in some low dimensional space describing their properties and recommend a item based on its proximity to the user in the latent space.

Conclusions

In this case study, we saw three different ways of building recommendation systems:

We also understood advantages/disadvantages of these recommendation systems and when to use which kind of recommendation systems. Once we build these recommendation systems, we can use A/B Testing to measure the effectiveness of these systems. Here is an article explaining how Amazon use A/B Testing to measure effectiveness of its recommendation systems.