Databases and Information Systems
Sie sind hier: Startseite Research Current Projects
<< back


S2X (SPARQL on Spark with GraphX) is a SPARQL query processor for Hadoop based on Spark GraphX. It combines graph-parallel abstraction of GraphX to implement the graph pattern matching part of SPARQL with data-parallel computation of Spark to build the results of other SPARQL operators.

Related Publications

  • Alexander Schätzle, Martin Przyjaciel-Zablocki, Thorsten Berberich, Georg Lausen:
    S2X: Graph-Parallel Querying of RDF with GraphX.pdf ]
    Biomedical Data Management and Graph Online Querying, VLDB 2015 Workshops (Big-O(Q) 2015) and DMAH, Revised Selected Papers, volume 9579 of Lecture Notes in Computer Science (LNCS), pages 155-168.
    Hawaii (USA), August 2015.


  • Apache Hadoop, we recommend Cloudera's Distribution of Hadoop CDH, the implementation is tested with CDH 5.3
  • Apache Spark with GraphX, included in CDH
  • RDF data in (extended) N-Triples format. Beyond the syntax of N-Triples it also supports the most commonly used Prefixes as well as the Prefixes used in the SP2Bench, LUBM, BSBM and WatDiv benchmarks.


06/2015: S2X v1.0 available for download (source and binaries)

Version Description
S2X v1.0 Readme and Sources S2X_v1.0_src.tar


You can also checkout the source code of S2X from the following GitHub repository.

<< back