S2X
S2X (SPARQL on Spark with GraphX) is a SPARQL query processor for Hadoop based on Spark GraphX. It combines graph-parallel abstraction of GraphX to implement the graph pattern matching part of SPARQL with data-parallel computation of Spark to build the results of other SPARQL operators.Related Publications
-
Alexander Schätzle, Martin Przyjaciel-Zablocki, Thorsten Berberich, Georg Lausen:
S2X: Graph-Parallel Querying of RDF with GraphX [ .pdf ]
Biomedical Data Management and Graph Online Querying, VLDB 2015 Workshops (Big-O(Q) 2015) and DMAH, Revised Selected Papers, volume 9579 of Lecture Notes in Computer Science (LNCS), pages 155-168. Hawaii (USA), August 2015.
Requirements
- Apache Hadoop, we recommend Cloudera's Distribution of Hadoop CDH, the implementation is tested with CDH 5.3
- Apache Spark with GraphX, included in CDH
- RDF data in (extended) N-Triples format. Beyond the syntax of N-Triples it also supports the most commonly used Prefixes as well as the Prefixes used in the SP2Bench, LUBM, BSBM and WatDiv benchmarks.
Downloads
06/2015: S2X v1.0 available for download (source and binaries)Version | Description | |
---|---|---|
S2X v1.0 | Readme and Sources | S2X_v1.0_src.tar |
GitHub
You can also checkout the source code of S2X from the following GitHub repository.https://github.com/aschaetzle/S2X
<< back