<< back

S2X

S2X (SPARQL on Spark with GraphX) is a SPARQL query processor for Hadoop based on Spark GraphX. It combines graph-parallel abstraction of GraphX to implement the graph pattern matching part of SPARQL with data-parallel computation of Spark to build the results of other SPARQL operators.

Related Publications

Alexander Schätzle, Martin Przyjaciel-Zablocki, Thorsten Berberich, Georg Lausen:
S2X: Graph-Parallel Querying of RDF with GraphX [ .pdf ]
Biomedical Data Management and Graph Online Querying, VLDB 2015 Workshops (Big-O(Q) 2015) and DMAH, Revised Selected Papers, volume 9579 of Lecture Notes in Computer Science (LNCS), pages 155-168.
Hawaii (USA), August 2015.

Requirements

Apache Hadoop, we recommend Cloudera's Distribution of Hadoop CDH, the implementation is tested with CDH 5.3
Apache Spark with GraphX, included in CDH
RDF data in (extended) N-Triples format. Beyond the syntax of N-Triples it also supports the most commonly used Prefixes as well as the Prefixes used in the SP2Bench, LUBM, BSBM and WatDiv benchmarks.

Downloads

06/2015: S2X v1.0 available for download (source and binaries)

Version	Description
S2X v1.0	Readme and Sources	S2X_v1.0_src.tar

GitHub

You can also checkout the source code of S2X from the following GitHub repository.
https://github.com/aschaetzle/S2X

<< back