Uni-Logo
Datenbanken und Informationssysteme
Sie sind hier: Startseite Forschung Aktuelle Projekte Distributed Processing of Semantic Data
 

Distributed Processing of Semantic Data

The almost unlimited amount of data in the web requires new technologies to handle and analyze very large data sets. Distributed computing platforms like MapReduce (Google), Cassandra (Facebook) or Dryad (Microsoft) have gained traction in different application areas in recent years and confirmed to be well-suited for large-scale data management. Our goal is to develop techniques for the Semantic Web that are taking advantage of such ready-to-use platforms and their scaling-behavior to keep up with the growing proliferation of semantic data. Thereby we are especially interested in evaluating SPARQL queries on large RDF data sets and path queries with navigational capabilities for the analysis of social network graphs. Both approaches are prototyped and executed in parallel on a computer cluster. For this purpose and for further experiments and evaluations we set up a cluster of ten machines and use Hadoop, an open source MapReduce framework. In addition, we investigate the area of tension between distributed computing approaches and traditional database technologies with the aim to combine the benefits of both areas.

MapReduce is a distributed programming paradigm for processing large quantities of data. Data processing in this framework is split into two different phases that can be run in parallel (i.e. have no data dependencies), namely map, where a computation is carried out for each input tuple and reduce, which combines outputs of several mappers.

Downloads

  • PigSPARQL: SPARQL 1.0 query engine for Hadoop based on Apache Pig
  • Sempala: SPARQL 1.0 query engine for Hadoop based on Cloudera Impala
  • S2RDF: SPARQL 1.0 query engine for Hadoop based on Apache Spark SQL
  • S2X: SPARQL 1.0 query engine for Hadoop based on Apache Spark GraphX
  • MapMerge: Map-side merge join implementation for SPARQL Basic Graph Patterns with MapReduce
  • RDFPath: expressive RDF path query language for MapReduce
  • TriAL-QL: Distributed Processing of Navigational Queries on Hadoop

Project Members:

Related Publications

  • Alexander Schätzle, Martin Przyjaciel-Zablocki, Simon Skilevic, Georg Lausen:
    S2RDF: RDF Querying with SPARQL on Spark.pdf ]
    Proceedings of the VLDB Endowment (PVLDB), Volume 9, No. 10, June 2016.
    42nd International Conference on Very Large Data Bases (VLDB 2016). New Delhi (India), September 2016.
  • Alexander Schätzle, Martin Przyjaciel-Zablocki, Simon Skilevic, Georg Lausen:
    S2RDF: RDF Querying with SPARQL on SparkTech. Report ]
    Computing Research Repository (CoRR), December 2015.
  • Alexander Schätzle, Martin Przyjaciel-Zablocki, Thorsten Berberich, Georg Lausen:
    S2X: Graph-Parallel Querying of RDF with GraphX.pdf ]
    Proc. of 1st International Workshop on Big-Graphs Online Querying (Big-O(Q) 2015)
    at VLDB 2015. Hawaii (USA), August 2015.
  • Martin Przyjaciel-Zablocki, Alexander Schätzle, Georg Lausen:
    TriAL-QL: Distributed Processing of Navigational Queries.pdf ] (Best Paper Runner-Up Award)
    Proc. of 18th International Workshop on the Web and Databases (WebDB 2015)
    at SIGMOD 2015. Melbourne (Australia), June 2015.
  • Martin Przyjaciel-Zablocki, Alexander Schätzle, Adrian Lange:
    TriAL-QL: Distributed Processing of Navigational Queries.pdf ]
    Proc. of the 9th Alberto Mendelzon International Workshop on Foundations of Data Management (AMW 2015)
    Lima (Peru), May 2015.
  • Alexander Schätzle, Martin Przyjaciel-Zablocki, Antony Neu, Georg Lausen:
    Sempala: Interactive SPARQL Query Processing on Hadoop.pdf ]
    Proc. of the 13th International Semantic Web Conference (ISWC 2014). Riva del Garda (Italy).
  • Alexander Schätzle, Martin Przyjaciel-Zablocki, Thomas Hornung, Georg Lausen:
    Large-Scale RDF Processing with MapReduceBook ]
    Large Scale and Big Data - Processing and Management 2014: 151--182. Auerbach Publications 2014
  • Martin Przyjaciel-Zablocki, Alexander Schätzle, Eduard Skaley, Thomas Hornung, Georg Lausen:
    Map-Side Merge Joins for Scalable SPARQL BGP Processing.pdf ]
    Proc. of the 5th IEEE International Conference on Cloud Computing Technology and Science, (CloudCom 2013). Bristol (UK).
  • Alexander Schätzle, Martin Przyjaciel-Zablocki, Thomas Hornung, Georg Lausen:
    PigSPARQL: A SPARQL Query Processing Baseline for Big Data.pdf ] [ Poster ]
    Proc. of the ISWC 2013 Posters & Demonstrations Track (ISWC 2013). Sydney (Australia).
  • Alexander Schätzle, Antony Neu, Georg Lausen, Martin Przyjaciel-Zablocki:
    Large-Scale Bisimulation of RDF Graphs.pdf ] (Best Paper Award)
    Proc. of the Fifth Workshop on Semantic Web Information Management (SWIM 2013),
    in conjunction with the 2013 ACM International Conference on Management of Data (SIGMOD 2013). New York (USA).
  • Martin Przyjaciel-Zablocki, Alexander Schätzle, Thomas Hornung, Io Taxidou:
    Towards a SPARQL 1.1 Feature Benchmark on Real-World Social Network Data.pdf ]
    Proc. of the First International Workshop on Benchmarking RDF Systems (BeRSys 2013),
    co-located with the 10th Extended Semantic Web Conference (ESWC 2013). Montpellier (France).
  • Alexander Schätzle, Martin Przyjaciel-Zablocki, Christopher Dorner, Thomas Hornung, Georg Lausen:
    Cascading Map-Side Joins over HBase for Scalable Join Processing.pdf ]
    Proc. of the Joint Workshop on Scalable and High-Performance Semantic Web Systems (SSWS+HPCSW 2012),
    in conjunction with the International Semantic Web Conference (ISWC 2012). Boston (USA).
  • Martin Przyjaciel-Zablocki, Alexander Schätzle, Thomas Hornung, Christopher Dorner, Georg Lausen:
    Cascading Map-Side Joins over HBase for Scalable Join ProcessingTech. Report ]
    Computing Research Repository (CoRR), June 2012.
  • Martin Przyjaciel-Zablocki, Alexander Schätzle, Thomas Hornung, Georg Lausen:
    RDFPath: Path Query Processing on Large RDF Graphs with MapReduce (extended revised version).pdf ]
    The Semantic Web: ESWC 2011 Workshops, Revised Selected Papers, LNCS 7117, pp. 50–64, Springer, 2011.
  • Alexander Schätzle, Martin Przyjaciel-Zablocki, Georg Lausen:
    PigSPARQL: Mapping SPARQL to Pig Latin.pdf ]
    3th International Workshop on Semantic Web Information Management (SWIM 2011),
    in conjunction with the 2011 ACM International Conference on Management of Data (SIGMOD 2011). Athens (Greece).
  • Martin Przyjaciel-Zablocki, Alexander Schätzle, Thomas Hornung, Georg Lausen:
    RDFPath: Path Query Processing on Large RDF Graphs with MapReduce.pdf ]
    1st Workshop on High-Performance Computing for the Semantic Web (HPCSW 2011),
    collocated with 8th Extended Semantic Web Conference (ESWC 2011). Heraklion (Greece).
  • Alexander Schätzle, Martin Przyjaciel-Zablocki, Thomas Hornung, Georg Lausen:
    PigSPARQL: Übersetzung von SPARQL nach PigLatin.pdf ]
    In Proc. 14th conference on Database Systems for Business, Technology and Web (BTW 2011).
    Kaiserslautern (Germany).

References