Querying Big Data

Lecturer:	Prof. Dr. Georg Lausen
Supervisor:	Alexander Schätzle, Geb. 051, Raum 01-028 Martin Przyjaciel-Zablocki, Geb. 051, Raum 01-028
Tutor:	Thorsten Berberich (berberit@informatik.uni-freiburg.de) Daniel Brand (brandd@informatik.uni-freiburg.de)
Time & Place:	Wednesday, 14 h – 18h (c.t.), Building 051, Room SR 00 031(MMR)
Language:	Exercise sheets will be written in English. The meetings with the tutor will be hold in German or English.
Study:	Master

Registration

Please apply via Vorlesungsverzeichnis (Course Catalog) for this Lab Course, as the number of participants is limited to 15. The introductory meeting takes places on 30. April 2014, 14h in room 051-00-031 (MMR).

Exercise Sheets

This course is based on practical exercise sheets that has to be solved individually. The submitted solutions will be marked and discussed with the tutor (compulsory attendance).

Content

In times of Web 2.0, Semantic Web and social networks like Facebook and Twitter, new challenges arise due to the rapidly growing size of data that necessitate a distributed storage and processing. In recent years, MapReduce has become the de facto standard for distributed, parallel processing of large-scale data. Cloud services like Amazon's Elastic Compute Cloud (EC2) enable also small and medium sized companies to evaluate their data with MapReduce by provisioning resources dynamically as needed without having to maintain their own infrastructure.
In this course, we will use Apache Hadoop (more precisely the Cloudera distribution of Hadoop), one of the most popular open-source Big Data frameworks. The participants will develop and implement applications on top of Hadoop that use not only MapReduce but also other parts of the rich Hadoop ecosystem like Apache Pig or Hive. Prior knowledge of Hadoop/MapReduce is desirable but not required.
However, you should have prior knowledge in Java programming (as well as using an IDE) and be willing to familiarize yourself with a probably unknown concept of programming applications for large-scale data processing.

Zuletzt geändert am: 30.04.2014