Querying Big Data
|Lecturer:||Prof. Dr. Georg Lausen|
|Supervisor:||Alexander Schätzle, Geb. 051, Raum 01-028|
Martin Przyjaciel-Zablocki, Geb. 051, Raum 01-028
|Tutor:||Florian Muhl (firstname.lastname@example.org)|
|Time & Place:||Wednesday, 14h – 18h (c.t.), Building 051, Room SR 00 031 (MMR)|
|Language:||Exercise sheets will be written in English. |
The meetings with the tutor will be hold in German or English.
Please apply via HISinOne (Course Catalog) for this Lab Course, as the number of participants is limited to 12. The introductory meeting takes places on in room 051-00-031 (MMR).
This course is based on practical exercise sheets that has to be solved individually. The submitted solutions will be marked and discussed with the tutor (compulsory attendance).
In times of Web 2.0, Semantic Web and social networks like Facebook and Twitter, new challenges arise due to the rapidly growing size of data that necessitate a distributed storage and processing.
In recent years, MapReduce has become the de facto standard for distributed, parallel processing of large-scale data.
Cloud services like Amazon's Elastic Compute Cloud (EC2) enable also small and medium sized companies to evaluate their data with MapReduce by provisioning resources dynamically as needed without having to maintain their own infrastructure.
In this course, we will use Apache Hadoop (more precisely the Cloudera distribution of Hadoop), one of the most popular open-source Big Data frameworks. The participants will develop and implement applications on top of Hadoop that use not only MapReduce but also other parts of the rich Hadoop ecosystem like Apache Pig or Hive. Prior knowledge of Hadoop/MapReduce is desirable but not required.
However, you should have prior knowledge in Java programming (as well as using an IDE) and be willing to familiarize yourself with a probably unknown concept of programming applications for large-scale data processing.