Tutorial: Recommender System Evaluation, Replication and Replicability by recommenders

Introduction

This page contains additional content for the Tutorial on Recommender System Evaluation, Replication and Replicability at the 2015 ACM Conference on Recommender Systems. The tutorial shows how recommender systems experiments should be presented in order to allow for (easy) reproducibility and replication. Refer to the tutorial abstract for more information.

Instructors

The tutorial is presented by

Alejandro Bellogìn

Alan Said

Prerequisites

The examples used in this additional material requires basic understanding programming. Having coded Java before is a definite plus.

Before continuing to the next step, please check out the tutorial code repository on GitHub.

Terminology

Repeating - The act of performing the same experiment again, i.e. using the same code, same data, etc.

Replication - The act of attempting to recreate an experiment from a description.

Reproducibility - The act of attempting to recreate the results.

Reuse - The act of using the same techniques without any expectations on the results.

Hands on examples

In these hands on examples, we are using RiVal to evaluate recommendations generated by popular recommendation frameworks, e.g. LensKit, Mahout, MyMediaLite.

Thresholds and splits

The code in CrossValidation.java performs training/test splitting on the Movielens100k dataset. The dataset is prepared for a 5-fold cross validation before recommendation is performed using a Mahout recommender, namely a GenericUserBasedRecommender with PearsonCorrelationSimilarity with a fixed neighborhood size of 50. The final variables N_FOLDS, AT in the class header represent the type of information often disclosed in research papers, i.e. the number folds, the neighborhood size and the length of the list of recommended items. The static variable REL_TH, NEIGH_SIZE, and PER_USER represent information which is not always described in research papers, i.e. the relevance threshold of an item (the minimum rating to deal with), and whether training/test splitting was performed on a per-user basis (users in the test set must appear in the training set). Run the pre-configured task by entering mvn clean install; mvn exec:java -Dexec.mainClass="net.recommenders.tutorial.CrossValidation" in the folder where you have checked out the tutorial code. The resulting output looks something like this:

NDCG@10: 0.0292752140037415
RMSE: 1.108653420946922
P@10: 0.039915164369035125

Changing the splitting criterion from the default (not per-user) to per-user by running mvn exec:java -Dexec.mainClass="net.recommenders.tutorial.CrossValidation" -Dexec.args="-u false" generates completely different results, both for accuracy as well as error-based metrics.

NDCG@10: 0.02921891771562769
RMSE: 1.104452226664006
P@10: 0.04091198303287395

If we instead change the theshold by running mvn exec:java -Dexec.mainClass="net.recommenders.tutorial.CrossValidation" -Dexec.args="-t 4.0", the results become

NDCG@10: 0.0292752140037415
RMSE: 1.108653420946922
P@10: 0.033149522799575906

Note that in this case, only the Precision@10 value changes compared to the initial run. This is due to the fact that the threshold is not taken into consideration for error-based metrics. We leave the task of changing the neighborhood size to the reader.

Further configurations

The class RandomValidation allows similar tweaks to the be done on the split, recommendation and evaluation pipeline. The reader is encouraged to test the various combinations available in order to get an estimate of how much the evaluation results may change with even minor tweaks.

Slides

Replicable Evaluation of Recommender Systems from Alejandro Bellogin