Sempala
Sempala is a SPARQL-over-SQL approach to provide interactive-time SPARQL query processing on Hadoop. It stores RDF data in a columnar layout (Parquet) on HDFS and uses Impala, a massive parallel processing (MPP) SQL query engine for Hadoop, as the execution layer on top of it. SPARQL queries are translated into Impala SQL for execution.Related Publications
-
Alexander Schätzle, Martin Przyjaciel-Zablocki, Antony Neu, Georg Lausen:
Sempala: Interactive SPARQL Query Processing on Hadoop [ .pdf ]
Proc. of the 13th International Semantic Web Conference (ISWC 2014). Riva del Garda (Italy).
Requirements
- Apache Hadoop, we recommend Cloudera's Distribution of Hadoop CDH, the implementation is tested with CDH 4.5
- Cloudera Impala, included in CDH
- RDF data in (extended) N-Triples format. Beyond the syntax of N-Triples it also supports the most commonly used Prefixes as well as the Prefixes used in the SP2Bench, LUBM and BSBM benchmarks.
Downloads
Version | Description | |
---|---|---|
Sempala v2.0 | Binaries | Sempala_v2.0_bin.tar |
Sempala v1.0 | Binaries | Sempala_v1.0_bin.tar |
GitHub
You can also checkout the source code of Sempala from the following GitHub repository.https://github.com/aschaetzle/Sempala
<< back