Introduction
This page contains additional content for the Tutorial on Recommender System Evaluation, Replication and Replicability at the 2015 ACM Conference on Recommender Systems. The tutorial shows how recommender systems experiments should be presented in order to allow for (easy) reproducibility and replication. Refer to the tutorial abstract for more information.
Instructors
The tutorial is presented by
Prerequisites
The examples used in this additional material requires basic understanding programming. Having coded Java before is a definite plus.
Before continuing to the next step, please check out the tutorial code repository on GitHub.
Terminology
Repeating - The act of performing the same experiment again, i.e. using the same code, same data, etc.
Replication - The act of attempting to recreate an experiment from a description.
Reproducibility - The act of attempting to recreate the results.
Reuse - The act of using the same techniques without any expectations on the results.
Hands on examples
In these hands on examples, we are using RiVal to evaluate recommendations generated by popular recommendation frameworks, e.g. LensKit, Mahout, MyMediaLite.Thresholds and splits
The code inCrossValidation.java
performs training/test splitting on the Movielens100k dataset. The dataset is prepared for a 5-fold cross validation before recommendation is performed using a Mahout recommender, namely a GenericUserBasedRecommender
with PearsonCorrelationSimilarity
with a fixed neighborhood size of 50.
The final variables N_FOLDS
, AT
in the class header represent the type of information often disclosed in research papers, i.e. the number folds, the neighborhood size and the length of the list of recommended items. The static variable REL_TH
, NEIGH_SIZE
, and PER_USER
represent information which is not always described in research papers, i.e. the relevance threshold of an item (the minimum rating to deal with), and whether training/test splitting was performed on a per-user basis (users in the test set must appear in the training set).
Run the pre-configured task by entering mvn clean install; mvn exec:java -Dexec.mainClass="net.recommenders.tutorial.CrossValidation"
in the folder where you have checked out the tutorial code.
The resulting output looks something like this:
NDCG@10: 0.0292752140037415 RMSE: 1.108653420946922 P@10: 0.039915164369035125
Changing the splitting criterion from the default (not per-user) to per-user by running
mvn exec:java -Dexec.mainClass="net.recommenders.tutorial.CrossValidation" -Dexec.args="-u false"
generates completely different results, both for accuracy as well as error-based metrics.
NDCG@10: 0.02921891771562769 RMSE: 1.104452226664006 P@10: 0.04091198303287395
If we instead change the theshold by running mvn exec:java -Dexec.mainClass="net.recommenders.tutorial.CrossValidation" -Dexec.args="-t 4.0"
, the results become
NDCG@10: 0.0292752140037415 RMSE: 1.108653420946922 P@10: 0.033149522799575906Note that in this case, only the Precision@10 value changes compared to the initial run. This is due to the fact that the threshold is not taken into consideration for error-based metrics. We leave the task of changing the neighborhood size to the reader.
Further configurations
The class RandomValidation
allows similar tweaks to the be done on the split, recommendation and evaluation pipeline. The reader is encouraged to test the various combinations available in order to get an estimate of how much the evaluation results may change with even minor tweaks.
Slides