<< back

Sempala

Sempala is a SPARQL-over-SQL approach to provide interactive-time SPARQL query processing on Hadoop. It stores RDF data in a columnar layout (Parquet) on HDFS and uses Impala, a massive parallel processing (MPP) SQL query engine for Hadoop, as the execution layer on top of it. SPARQL queries are translated into Impala SQL for execution.

Related Publications

Alexander Schätzle, Martin Przyjaciel-Zablocki, Antony Neu, Georg Lausen:
Sempala: Interactive SPARQL Query Processing on Hadoop [ .pdf ]
Proc. of the 13th International Semantic Web Conference (ISWC 2014). Riva del Garda (Italy).

Requirements

Apache Hadoop, we recommend Cloudera's Distribution of Hadoop CDH, the implementation is tested with CDH 4.5
Cloudera Impala, included in CDH
RDF data in (extended) N-Triples format. Beyond the syntax of N-Triples it also supports the most commonly used Prefixes as well as the Prefixes used in the SP2Bench, LUBM and BSBM benchmarks.

Downloads

Version	Description
Sempala v2.0	Binaries	Sempala_v2.0_bin.tar
Sempala v1.0	Binaries	Sempala_v1.0_bin.tar

GitHub

You can also checkout the source code of Sempala from the following GitHub repository.
https://github.com/aschaetzle/Sempala

<< back