Databases and Information Systems
Sie sind hier: Startseite Research Current Projects
<< back


Sempala is a SPARQL-over-SQL approach to provide interactive-time SPARQL query processing on Hadoop. It stores RDF data in a columnar layout (Parquet) on HDFS and uses Impala, a massive parallel processing (MPP) SQL query engine for Hadoop, as the execution layer on top of it. SPARQL queries are translated into Impala SQL for execution.

Related Publications

  • Alexander Schätzle, Martin Przyjaciel-Zablocki, Antony Neu, Georg Lausen:
    Sempala: Interactive SPARQL Query Processing on Hadoop.pdf ]
    Proc. of the 13th International Semantic Web Conference (ISWC 2014). Riva del Garda (Italy).


  • Apache Hadoop, we recommend Cloudera's Distribution of Hadoop CDH, the implementation is tested with CDH 4.5
  • Cloudera Impala, included in CDH
  • RDF data in (extended) N-Triples format. Beyond the syntax of N-Triples it also supports the most commonly used Prefixes as well as the Prefixes used in the SP2Bench, LUBM and BSBM benchmarks.


Version Description
Sempala v2.0 Binaries Sempala_v2.0_bin.tar
Sempala v1.0 Binaries Sempala_v1.0_bin.tar


You can also checkout the source code of Sempala from the following GitHub repository.

<< back