This project implements User-Based Collaborative Filtering (UBCF) to predict movie ratings for users. The system utilizes cosine similarity, global average ratings, and Root Mean Squared Error (RMSE) to evaluate prediction performance. Given a dataset of user-movie ratings, it determines how similar users are and predicts unseen movie ratings accordingly.
The project optimizes for both accuracy and execution time, ensuring efficient data structures and algorithms that handle large datasets effectively. By leveraging efficient hash maps and matrix computations, the system can make predictions quickly while maintaining accuracy.
-
Loads and processes training data from a text file.
-
Computes the global average rating for cases where similarity-based predictions are not applicable.
-
Constructs user-movie rating mappings using hash maps for quick lookups.
-
Calculates user rating norms, which are essential for cosine similarity calculations.
-
Reads the test dataset containing user-movie pairs.
-
Computes cosine similarity between users who have rated the same movies.
-
Uses a threshold-based filtering approach to ignore weakly correlated users.
-
Predicts ratings using a weighted sum of similar users’ ratings.
-
Computes RMSE by comparing predicted ratings with actual values in the test dataset.
-
Load Training Data → Reads and processes user-movie ratings.
-
Compute Similarities → Uses cosine similarity to measure user similarity.
-
Filter Users → Applies a threshold to discard weakly correlated users.
-
Make Predictions → Computes weighted averages for rating predictions.
-
Evaluate Performance → Computes RMSE to assess prediction accuracy.
-
User-Based Collaborative Filtering: Predicts ratings by identifying similar users.
-
Cosine Similarity: Determines user similarity by measuring vector alignment.
-
Global Average Fallback: Provides a default rating when similarity-based predictions fail.
-
Hash Map Data Structures: Ensures fast lookup for user-movie ratings and related computations.
-
Threshold Filtering: Discards weak correlations to improve prediction quality.
-
RMSE Calculation: Measures the accuracy of predictions against actual ratings.
-
Execution Speed: Optimized using efficient hash tables and matrix operations.
-
Memory Efficiency: Uses unordered maps to reduce redundant data storage.
-
Prediction Accuracy: Evaluates the system using RMSE.
-
Test Results:
-
RMSE: 2.8970
-
Elapsed Time: 0.0093 seconds
-
-
Cosine Similarity Calculation:double computeCosineSimilarity(int userA, int userB);Computes similarity between two users based on shared movie ratings.
-
Rating Prediction (User-Based CF):double predictRatingUserBasedCF(int targetUserID, int targetMovieID);Predicts a user's rating for a movie by using a weighted sum of similar users' ratings.
-
RMSE Calculation:void loadAndPredictTestData(const string &testPath);Reads the test dataset, predicts ratings, and calculates RMSE for performance evaluation.