Cloud Computing
Organizers:
Prof. Dr. Georg Lausen
Alexander Schätzle
Martin Przyjaciel-Zablocki
Organizational Matters:
Introductory Meeting: Wednesday, 24th October 2012
Room: Buil. 51, SR 01-029
Time: Wednesday, 2 pm
Language: German / English
Prerequisites:
Basic knowledge in Java programming
Content:
Graphs have always played an important role in computer science, e.g. for modeling relationships, processes, networks etc.
In times of Web 2.0, Semantic Web and social networks like Facebook and Twitter, new challenges arise due to the rapidly growing size of such graph structures that necessitate a distributed storage and processing.
In recent years, MapReduce has become the de facto standard for distributed, parallel processing of large-scale data.
Cloud services like Amazon's Elastic Compute Cloud (EC2) enable also small and medium sized companies to evaluate their data with MapReduce by provisioning resources dynamically as needed without having to maintain their own infrastructure.
In the project the participants should develop and implement MapReduce applications for given graph problems on large (social) graphs.
The data basis will be a real-world dataset from the online music platform Last.fm offered in different sizes.
In an initial introduction phase the participants will familiarize themselves with the basic MapReduce principles and learn how to implement a MapReduce application by solving a mandatory exercise sheet.
We will use Apache Hadoop (more precisely the Cloudera distribution of Hadoop), one of the most popular open-source MapReduce frameworks.
Prior knowledge of MapReduce is desirable but not required.
The participants should have prior knowledge in Java programming (as well as using an IDE) and be willing to familiarize themselves with a probably unknown concept of programming applications for large-scale data processing.
Curriculum:
Master of Science: 3rd Semester (Teamproject / Masterproject)
ECTS: 16