Introductory Meeting: Wednesday, 30th October 2013
Room: Buil. 51, SR 01-029
Time: Wednesday, 2 pm
Language: German / English
Basic knowledge in Java programming
Graphs have always played an important role in computer science, e.g. for modeling relationships, processes, networks etc.
In times of Web 2.0, Semantic Web and social networks like Facebook and Twitter, new challenges arise due to the rapidly growing size of such graph structures that necessitate a distributed storage and processing.
In recent years, MapReduce has become the de facto standard for distributed, parallel processing of large-scale data.
Cloud services like Amazon's Elastic Compute Cloud (EC2) enable also small and medium sized companies to evaluate their data with MapReduce by provisioning resources dynamically as needed without having to maintain their own infrastructure.
In this project, we will use Apache Hadoop (more precisely the Cloudera distribution of Hadoop), one of the most popular open-source Big Data frameworks. The participants will develop and implement an application on top of Hadoop for large scale graph processing. Prior knowledge of Hadoop/MapReduce is desirable but not required. For participants who do not have prior knowledge of Hadoop/MapReduce there will be an initial introduction phase to familiarize themselves with the basic MapReduce principles and learn how to implement a MapReduce application by solving a mandatory exercise sheet. However, you should have prior knowledge in Java programming (as well as using an IDE) and be willing to familiarize yourself with a probably unknown concept of programming applications for large-scale data processing.
Master of Science: 3rd Semester (Teamproject / Masterproject)