<< back

S2RDF

S2RDF (SPARQL on Spark for RDF) is a SPARQL query processor for Hadoop based on Spark SQL. It uses the relational interface of Spark for query execution and comes with a novel partitioning schema for RDF called ExtVP (Extended Vertical Partitioning) that is an extension of the Vertical Partitioning (VP) schema introduced by Abadi et al. ExtVP enables to exclude unnecessary data from query processing by taking into account the possible relations between tables in VP.

Related Publications

Alexander Schätzle, Martin Przyjaciel-Zablocki, Simon Skilevic, Georg Lausen:
S2RDF: RDF Querying with SPARQL on Spark [ .pdf ]
Proceedings of the VLDB Endowment (PVLDB), Volume 9, No. 10, June 2016.
42nd International Conference on Very Large Data Bases (VLDB 2016). New Delhi (India), September 2016.
Alexander Schätzle, Martin Przyjaciel-Zablocki, Simon Skilevic, Georg Lausen:
S2RDF: RDF Querying with SPARQL on Spark [ Tech. Report ]
Computing Research Repository (CoRR), December 2015.

Requirements

Apache Hadoop, we recommend Cloudera's Distribution of Hadoop CDH, the implementation is tested with CDH 5.4
Apache Spark, included in CDH
RDF data in (extended) N-Triples format. Beyond the syntax of N-Triples it also supports the most commonly used Prefixes as well as the Prefixes used in the SP2Bench, LUBM, BSBM and WatDiv benchmarks.

Downloads

12/2015: S2RDF v1.0 available for download (source and binaries)

Version	Description
S2RDF v1.1	Readme, Sources and Binaries (Fixes for YAGO)	S2RDF_v1.1.tar
S2RDF v1.0	Readme, Sources and Binaries	S2RDF_v1.0.tar
YAGO Queries	Set of YAGO Queries used in Evaluation	YAGO_Queries.txt

GitHub

You can also checkout the source code of S2RDF from the following GitHub repository.
https://github.com/aschaetzle/S2RDF

<< back