S2XS2X (SPARQL on Spark with GraphX) is a SPARQL query processor for Hadoop based on Spark GraphX. It combines graph-parallel abstraction of GraphX to implement the graph pattern matching part of SPARQL with data-parallel computation of Spark to build the results of other SPARQL operators.
Alexander Schätzle, Martin Przyjaciel-Zablocki, Thorsten Berberich, Georg Lausen:
S2X: Graph-Parallel Querying of RDF with GraphX [ .pdf ]
Biomedical Data Management and Graph Online Querying, VLDB 2015 Workshops (Big-O(Q) 2015) and DMAH, Revised Selected Papers, volume 9579 of Lecture Notes in Computer Science (LNCS), pages 155-168. Hawaii (USA), August 2015.
- Apache Hadoop, we recommend Cloudera's Distribution of Hadoop CDH, the implementation is tested with CDH 5.3
- Apache Spark with GraphX, included in CDH
- RDF data in (extended) N-Triples format. Beyond the syntax of N-Triples it also supports the most commonly used Prefixes as well as the Prefixes used in the SP2Bench, LUBM, BSBM and WatDiv benchmarks.
Downloads06/2015: S2X v1.0 available for download (source and binaries)
|S2X v1.0||Readme and Sources||S2X_v1.0_src.tar|
SVNYou can also checkout the source code of S2X from the following subversion repository.
(username: anonymous, password: anonymous)