Uni-Logo
Databases and Information Systems
Sie sind hier: Startseite Research Current Projects Distributed Processing of Semantic Data
 

Distributed Processing of Semantic Data

Hadoop and its surrounding ecosystem have become the de-facto industry gold standard for Big Data applications used by leading internet companies such as Facebook, Amazon, Twitter, etc.
Fundamentally, Hadoop is a general purpose cluster computing platform that is not targeted to any specific application field. We investigate the adoption of these technologies to solve challenges related to the processing of large-scale semantic data to make the vision of a Semantic Web become reality. In addition, we investigate the area of tension between distributed computing and traditional database technologies with the aim to combine the benefits of both areas.
More specifically, we are especially interested in the evaluation SPARQL queries on large RDF datasets and complex path queries with navigational capabilities for the analysis of RDF graphs with social network characteristics. We have developed several prototypes based on Hadoop technologies that can be executed in a computer cluster made of commodity hardware. We host two own small Hadoop clusters (each consisting of 10 commodity servers) that can be used to run, test and evaluate our prototypes in a controlled environment. Our clusters can also be accessed by students to gain hands-on experience with a cutting-edge technology that is getting more and more important in the era of Big Data. To this end, we offer a lab course, master projects and theses to interested students of our faculty.

Subprojects

  • TriAL-QL Engine: TriAL-QL query processor for Hadoop based on Impala and Spark
  • RDFPath Engine: RDFPath processor for Hadoop based on Impala and Spark
  • Sempala: SPARQL query engine for Hadoop based on Cloudera Impala
  • S2RDF: SPARQL query engine for Hadoop based on Apache Spark SQL
  • S2X: SPARQL query engine for Hadoop based on Apache Spark GraphX
  • PigSPARQL: SPARQL query engine for Hadoop based on Apache Pig
  • Map-Side Merge Join: Optimized join strategy for the evaluation of RDFPath queries with MapReduce
  • RDFPath MapReduce Processor: Expressive RDF path query language implemented for MapReduce
  • TriAL-QL Engine: Distributed Processing of Navigational Queries on Hive

Project Members:

Related Publications

  • Martin Przyjaciel-Zablocki, Alexander Schätzle, Georg Lausen:
    Querying Semantic Knowledge Bases with SQL-on-Hadoop.pdf ]
    Proc. of the 4th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, (BeyondMR@SIGMOD 2017). Chicago, IL (USA), May 2017.
  • Alexander Schätzle, Martin Przyjaciel-Zablocki, Simon Skilevic, Georg Lausen:
    S2RDF: RDF Querying with SPARQL on Spark.pdf ]
    Proceedings of the VLDB Endowment (PVLDB), Volume 9, No. 10, June 2016.
    42nd International Conference on Very Large Data Bases (VLDB 2016). New Delhi (India), September 2016.
  • Alexander Schätzle, Martin Przyjaciel-Zablocki, Simon Skilevic, Georg Lausen:
    S2RDF: RDF Querying with SPARQL on SparkTech. Report ]
    Computing Research Repository (CoRR), December 2015.
  • Alexander Schätzle, Martin Przyjaciel-Zablocki, Thorsten Berberich, Georg Lausen:
    S2X: Graph-Parallel Querying of RDF with GraphX.pdf ]
    Proc. of 1st International Workshop on Big-Graphs Online Querying (Big-O(Q) 2015)
    at VLDB 2015. Hawaii (USA), August 2015.
  • Martin Przyjaciel-Zablocki, Alexander Schätzle, Georg Lausen:
    TriAL-QL: Distributed Processing of Navigational Queries.pdf ] (Best Paper Runner-Up Award)
    Proc. of 18th International Workshop on the Web and Databases (WebDB 2015)
    at SIGMOD 2015. Melbourne (Australia), June 2015.
  • Martin Przyjaciel-Zablocki, Alexander Schätzle, Adrian Lange:
    TriAL-QL: Distributed Processing of Navigational Queries.pdf ]
    Proc. of the 9th Alberto Mendelzon International Workshop on Foundations of Data Management (AMW 2015)
    Lima (Peru), May 2015.
  • Alexander Schätzle, Martin Przyjaciel-Zablocki, Antony Neu, Georg Lausen:
    Sempala: Interactive SPARQL Query Processing on Hadoop.pdf ]
    Proc. of the 13th International Semantic Web Conference (ISWC 2014). Riva del Garda (Italy).
  • Alexander Schätzle, Martin Przyjaciel-Zablocki, Thomas Hornung, Georg Lausen:
    Large-Scale RDF Processing with MapReduceBook ]
    Large Scale and Big Data - Processing and Management 2014: 151--182. Auerbach Publications 2014
  • Martin Przyjaciel-Zablocki, Alexander Schätzle, Eduard Skaley, Thomas Hornung, Georg Lausen:
    Map-Side Merge Joins for Scalable SPARQL BGP Processing.pdf ]
    Proc. of the 5th IEEE International Conference on Cloud Computing Technology and Science, (CloudCom 2013). Bristol (UK).
  • Alexander Schätzle, Martin Przyjaciel-Zablocki, Thomas Hornung, Georg Lausen:
    PigSPARQL: A SPARQL Query Processing Baseline for Big Data.pdf ] [ Poster ]
    Proc. of the ISWC 2013 Posters & Demonstrations Track (ISWC 2013). Sydney (Australia).
  • Alexander Schätzle, Antony Neu, Georg Lausen, Martin Przyjaciel-Zablocki:
    Large-Scale Bisimulation of RDF Graphs.pdf ] (Best Paper Award)
    Proc. of the Fifth Workshop on Semantic Web Information Management (SWIM 2013),
    in conjunction with the 2013 ACM International Conference on Management of Data (SIGMOD 2013). New York (USA).
  • Martin Przyjaciel-Zablocki, Alexander Schätzle, Thomas Hornung, Io Taxidou:
    Towards a SPARQL 1.1 Feature Benchmark on Real-World Social Network Data.pdf ]
    Proc. of the First International Workshop on Benchmarking RDF Systems (BeRSys 2013),
    co-located with the 10th Extended Semantic Web Conference (ESWC 2013). Montpellier (France).
  • Alexander Schätzle, Martin Przyjaciel-Zablocki, Christopher Dorner, Thomas Hornung, Georg Lausen:
    Cascading Map-Side Joins over HBase for Scalable Join Processing.pdf ]
    Proc. of the Joint Workshop on Scalable and High-Performance Semantic Web Systems (SSWS+HPCSW 2012),
    in conjunction with the International Semantic Web Conference (ISWC 2012). Boston (USA).
  • Martin Przyjaciel-Zablocki, Alexander Schätzle, Thomas Hornung, Christopher Dorner, Georg Lausen:
    Cascading Map-Side Joins over HBase for Scalable Join ProcessingTech. Report ]
    Computing Research Repository (CoRR), June 2012.
  • Martin Przyjaciel-Zablocki, Alexander Schätzle, Thomas Hornung, Georg Lausen:
    RDFPath: Path Query Processing on Large RDF Graphs with MapReduce (extended revised version).pdf ]
    The Semantic Web: ESWC 2011 Workshops, Revised Selected Papers, LNCS 7117, pp. 50–64, Springer, 2011.
  • Alexander Schätzle, Martin Przyjaciel-Zablocki, Georg Lausen:
    PigSPARQL: Mapping SPARQL to Pig Latin.pdf ]
    3th International Workshop on Semantic Web Information Management (SWIM 2011),
    in conjunction with the 2011 ACM International Conference on Management of Data (SIGMOD 2011). Athens (Greece).
  • Martin Przyjaciel-Zablocki, Alexander Schätzle, Thomas Hornung, Georg Lausen:
    RDFPath: Path Query Processing on Large RDF Graphs with MapReduce.pdf ]
    1st Workshop on High-Performance Computing for the Semantic Web (HPCSW 2011),
    collocated with 8th Extended Semantic Web Conference (ESWC 2011). Heraklion (Greece).
  • Alexander Schätzle, Martin Przyjaciel-Zablocki, Thomas Hornung, Georg Lausen:
    PigSPARQL: Übersetzung von SPARQL nach Pig Latin.pdf ]
    In Proc. 14th conference on Database Systems for Business, Technology and Web (BTW 2011).
    Kaiserslautern (Germany).

Finished Theses and Projects

  • Lavderim Shala: Distributed Processing of RDFPath Queries. Master Thesis (2016)
  • Visar Boshnjaku: A Scalable Engine for TriAL-QL on SQL-on-Hadoop. Master Thesis (2016)
  • Simon Skilevic: S2RDF: Distributed In-Memory Execution of SPARQL Queries Using Apache Spark SQL and Extended Vertical Partitioning. Master Thesis (2015)
  • Thorsten Berberich: Verteilte Auswertung von SPARQL Anfragen mit Apache Spark. Master Thesis (2015)
  • Antony Neu: Distributed Evaluation of SPARQL queries with Impala and MapReduce. Master Thesis (2014)
  • Adrian Lange: Distributed Path Query Processing on RDF. Master Thesis (2014)
  • Eduard Skaley: Implementierung und Optimierung von Merge-Joins mit MapReduce. Master Thesis (2013)
  • Christopher Dorner: Distributed Evaluation of Large RDF Graphs with HBase and MapReduce. Master Thesis (2012)

References