SempalaSempala is a SPARQL-over-SQL approach to provide interactive-time SPARQL query processing on Hadoop. It stores RDF data in a columnar layout (Parquet) on HDFS and uses Impala, a massive parallel processing (MPP) SQL query engine for Hadoop, as the execution layer on top of it. SPARQL queries are translated into Impala SQL for execution.
Alexander Schätzle, Martin Przyjaciel-Zablocki, Antony Neu, Georg Lausen:
Sempala: Interactive SPARQL Query Processing on Hadoop [ .pdf ]
Proc. of the 13th International Semantic Web Conference (ISWC 2014). Riva del Garda (Italy).
- Apache Hadoop, we recommend Cloudera's Distribution of Hadoop CDH, the implementation is tested with CDH 4.5
- Cloudera Impala, included in CDH
- RDF data in (extended) N-Triples format. Beyond the syntax of N-Triples it also supports the most commonly used Prefixes as well as the Prefixes used in the SP2Bench, LUBM and BSBM benchmarks.
Downloads05/2014: Sempala v1.0 available for download (source and binaries)
|Sempala v1.0||Readme and Binaries||Sempala_v1.0_bin.tar|
SVNYou can also checkout the source code of Sempala from the following subversion repository.
(username: anonymous, password: anonymous)