Uni-Logo
Databases and Information Systems
Sie sind hier: Startseite Teaching Lehrangebot Wintersemester 2017/18 Recommender Systems and Spark SQL
 

Recommender Systems and Spark SQL

Projects

The projects offered here are aligned with the lecture "Data Analysis and Query Languages" offered in the summer semester, which covers the main RSs techniques.

This page will be updated soon with more projects' descriptions.

1. Dynamic Collaborative-Filtering approach for cross-domain recommendations

Short description

Collaborative Filtering (CF) is one of the most useful approaches for producing recommendations. The key idea is to recommend items to a user considering items liked by other users which have a similar taste. Despite its popularity, this paradigm is strongly affected by the sparsity of ratings: the sparser the ratings are, the more difficult it is to find users of similar taste. Cross-domain Recommender Systems (CDRS) aim to recommend items from a different domain than that in which the preferences of the user have been collected, e.g. one could recommend movies (target domain) learning from the taste of the user for books (source domain). In CDRS the sparsity is inevitably a more serious problem, because of the introduction of two domains. As a consequence, one should choose wisely which of the domains should be taken into account to compute the similarity between users.

The goal of this project is to investigate and design an approach that chooses dynamically the domains for computing similarity. The project will be integrated into RecRDF4J, a framework developed at out department. Moreover a web application will be made available to the students to visualize the recommendations.

Dataset: Facebook dataset - implicit feedback for movies, books and music.

Pre-requisites: knowledge of RDF, Recommender Systems. Programming in Java; use of Maven.

Recommended number of participants: 2 per group

Status: not assigned

2. Exploting LoD datasets for recommending a visit to POIs

Short description

When a user visits a location which is new to her, she might be interested in visiting attractions or points-of-interest (POIs). However, trying to design a visit plan, i.e. deciding which POI should be visit first and which should be visited next according to the user's necessities might be overwhelming. Recommender Systems (RSs) can alleviate this problem and propose to users a plan which fits to their needs in the best possible way. Most of these RSs use closed datasets and therefore don't leverage the benefits of the openness of Semantic Web standards.

The goal of this project is to investigate which datasets are available in the Linked Open Data Cloud that can be used for POI recommendation. In addition to this, students will assess to which extent public transportation data is also available in the same format. Finally, the students will have to implement a RS based on the work "A Delay-Robust Touristic Plan Recommendation Using Real-World Public Transportation Information" which generates visit plans.

Dataset: LoD datasets for POI recommendation.

Pre-requisites: knowledge of RDF, Recommender Systems. Programming in Java; use of Maven.

Recommended number of participants: 2 per group

Status: not assigned

3. Data Analysis and Querying with Spark SQL

Short description

Graphs have always played an important role in computer science, e.g. for modeling relationships, processes, networks etc. In times of Web 2.0, Semantic Web and Social Networks new challenges arise due to the rapidly growing size of such graph structures that necessitate distributed storage and processing strategies. Spark has become an important technology for distributed and parallel processing of large-scale data, including RDF graphs which are a popular way for organizations to store their datasets.

In this project, we will use Apache Hadoop, one of the most popular open-source Big Data frameworks. The participants will develop and implement an application on top of Hadoop technologies (Spark SQL, HDFS, Hive) for large scale graph processing/analysis. In more details, the goal will be to extend PRoST (Partitioned RDF on Spark Tables), a new SPARQL processor for large RDF graphs, developed by our department. The task of each participant will be to add a feature into the system and to evaluate its impact using the University's Hadoop cluster with a given benchmark framework.

Pre-requisites: basic knowledge of RDF and SPARQL. Java Programming.

Recommended number of participants: 2 per group. Max. 1 group

Status: not assigned

More projects will be published soon...