S2RDF
S2RDF (SPARQL on Spark for RDF) is a SPARQL query processor for Hadoop based on Spark SQL. It uses the relational interface of Spark for query execution and comes with a novel partitioning schema for RDF called ExtVP (Extended Vertical Partitioning) that is an extension of the Vertical Partitioning (VP) schema introduced by Abadi et al. ExtVP enables to exclude unnecessary data from query processing by taking into account the possible relations between tables in VP.Related Publications
-
Alexander Schätzle, Martin Przyjaciel-Zablocki, Simon Skilevic, Georg Lausen:
S2RDF: RDF Querying with SPARQL on Spark [ .pdf ]
Proceedings of the VLDB Endowment (PVLDB), Volume 9, No. 10, June 2016.
42nd International Conference on Very Large Data Bases (VLDB 2016). New Delhi (India), September 2016. -
Alexander Schätzle, Martin Przyjaciel-Zablocki, Simon Skilevic, Georg Lausen:
S2RDF: RDF Querying with SPARQL on Spark [ Tech. Report ]
Computing Research Repository (CoRR), December 2015.
Requirements
- Apache Hadoop, we recommend Cloudera's Distribution of Hadoop CDH, the implementation is tested with CDH 5.4
- Apache Spark, included in CDH
- RDF data in (extended) N-Triples format. Beyond the syntax of N-Triples it also supports the most commonly used Prefixes as well as the Prefixes used in the SP2Bench, LUBM, BSBM and WatDiv benchmarks.
Downloads
12/2015: S2RDF v1.0 available for download (source and binaries)Version | Description | |
---|---|---|
S2RDF v1.1 | Readme, Sources and Binaries (Fixes for YAGO) | S2RDF_v1.1.tar |
S2RDF v1.0 | Readme, Sources and Binaries | S2RDF_v1.0.tar |
YAGO Queries | Set of YAGO Queries used in Evaluation | YAGO_Queries.txt |
GitHub
You can also checkout the source code of S2RDF from the following GitHub repository.https://github.com/aschaetzle/S2RDF
<< back