Data Analysis and Querying on Hadoop

Organizers:

Prof. Dr. Georg Lausen
Martin Przyjaciel-Zablocki
Alexander Schätzle

Organizational Matters:

Introductory Meeting: 19.10.2016, 14:00 (c.t.)
Room: Buil. 51, SR 01-029
Language: German / English
HISinOne: We kindly ask prospective participants to apply via HISinOne for participating in this project

ILIAS:

All course materials are provided on ILIAS:
Data Analysis and Querying on Hadoop

Prerequisites:

Java programming knowledge,
Participation in lab course 'Querying Big Data' in summer term highly recommended

Project Content

Graphs have always played an important role in computer science, e.g. for modeling relationships, processes, networks etc. In times of Web 2.0, Semantic Web and Social Networks new challenges arise due to the rapidly growing size of such graph structures that necessitate distributed storage and processing strategies. In recent years, MapReduce has become the de facto standard for distributed, parallel processing of large-scale data and paved the way for a rich ecosystem of open-sourced data processing frameworks. Cloud services like Amazon's Elastic Compute Cloud (EC2) enable also small and medium sized companies without an own infrastructure to evaluate their big data with such frameworks by providing resources dynamically as needed.
In this project, we will use Apache Hadoop (more precisely the Cloudera distribution of Hadoop), one of the most popular open-source Big Data frameworks. The participants will develop and implement an application on top of Hadoop for large scale graph processing/analysis. Prior knowledge of Hadoop/MapReduce is desirable but not required. For participants who do not have prior knowledge of Hadoop/MapReduce, there will be an initial introduction phase to familiarize themselves with the basics of MapReduce by solving a mandatory exercise sheet. However, you should have prior knowledge in Java programming (as well as using an IDE) and be willing to familiarize yourself with the principles of distributed processing of large-scale data.

More information about topics will be presented in the introductory meeting. In addition, the participants are also invited to suggest threir own ideas.

Curriculum

Master of Science: 3rd Semester (Teamproject / Masterproject)
ECTS: 16