The GCX XQuery Engine – Benchmark Results for GCX v1.0β
G(arbage) C(collected) X(query) Engine – An open source in-memory XQuery engine
The GCX engine is an in-memory XQuery engine designed for memory-efficient XQuery evaluation against large XML documents. The C++-prototype, which was released in v1.0β, supports a powerful fragment of the XQuery language. The following experiments are part of our publication at ICDE 2007.
Experiments with the XMark Benchmark
We measured the performance of the GCX egnine v1.0β on benchmark data from XMark – An XML Benchmark Project. To this end, we generated XML documents of sizes between 10MB and 200MB with the XMark data generator xmlgen. As the GCX engine does not yet support the full XQuery standard, we modified the XMark benchmarks as follows:
-
The GCX engine does not yet support XML attributes. Consequently, all attributes in the XML documents were rewritten to subelements.
For instance, an opening tag
<book id="1">
is rewritten to<book><book_id>1</book_id>
(a 1MB sample XML document is available here). - The GCX engine v1.0β does not support aggregation (the support for aggregation has been added in v2.0!). Hence, we slightly modified selected XMark queries, as shown in the XMark Queries section. Each query is related to the same-numbered XMark query, but due to the rewriting some of them yield different results than their original counterparts.
In our experiments, all XQuery engines evaluated the same rewritten queries on exactly the same input XML documents.
Reference Implementations
The GCX engine has two main characteristics: It is an in-memory XQuery engine and it is geared towards streaming XQuery evaluation. With this in mind, we chose the following reference implementations.
- The FluXQuery engine (programming laguage: Java) is probably the most natural choice for a reference implementation: it is also a main memory XQuery engine geared towards XML stream processing and it implements a similar XQuery fragment as the GCX engine v1.0β. Moreover, the FluXQuery engine is able to exploit schema information which come from a Document Type Definition (DTD). Consequently, the FluXQuery engine was provided with the XMark DTD in our experiments.
- The in-memory query engines Galax v0.6.8 (programming language: Objective CAML), Qizx/open v1.1 (programming language: Java) and Saxon v8.7.1 (programming laguage: Java) implement the full XQuery standard. While Galax has not been designed with XML stream processing in mind, it is often consulted in XQuery benchmarks and – for this reason – also included here. Note that the static projection of Galax could not be made to work in our experiments.
- Finally, we chose MonetDB v4.12.0/XQuery v0.12.0, a mature XML database system. As a secondary-storage implementation, MonetDB can make use of index structures to speed up query evaluation, which is not done by streaming in-memory XQuery engines. On the other side, MonetDB XQuery stores the entire data physically before query evaluation. To account for the fact that the GCX engine and the other main memory engines read the complete input XML document for each query evaluation, in each run we forced the MonetDB server to reload the complete XML document (and include this document loading time in our time measurements).
Execution Platform
We ran our experiments on a 3GHz CPU Intel
Pentium IV with 2GB RAM, running SuSe Linux 10.0.
All Java-based systems were executed using J2RE v1.4.2.
The focus of the benchmarks was primarily on main memory consumption, but we also consider query execution time.
Time is given either in seconds (abbreviated with "s")
(e.g. 1.59s means 1 second and 59 millisecond)
or in minutes (abbreviated with "m")
(e.g. 02:07m means 2 minutes and 7 seconds).
Memory consumption is given in megabytes (abbreviated with "MB") or gigabytes (abbreviated with "GB").
The main memory consumption was measured with the Linux "top" command.
For each system and query we set a timeout of one hour. For each system and size
of the input XML document, we measured the high watermark of non-swapped memory consumption,
and the total query evaluation time.
"Not available" (abbreviated with "n/a") indicates that the
query could not be expressed in the language supported by the specific engine, while a dash (abbreviated with "–")
denotes failure, e.g. caused by segmentation faults. With the Java-based engines, we could observe that due to effects
caused by automatic memory management and the Java Virtual Machine, memory consumption often increased with the XML document
size even though the buffer size remained constant (e.g. in case of the FluXQuery engine).
Benchmark Results
The table below summarizes the runtime and memory consumption of the GCX engine (v1.0β) compared to the other XQuery engines. Note that the benchmark results are not permanently kept up to date. The behavior (needed runtime and memory consumption) of the current version (v2.1) of the GCX engine might differ from the benchmark results given in the following table.
Query/Engine | XML document size | GCX v1.0β |
FluXQuery | Galax v0.6.8 |
MonetDB v4.12.0 XQuery v0.12.0 |
Saxon v8.7.1 |
Qizx/open v1.1 |
---|---|---|---|---|---|---|---|
Q1 | 10MB | 0.18s / 1.2MB | 1.59s / 50MB | 5.45s / 186MB | 0.86s / 30MB | 1.48s / 80MB | 1.20s / 38MB |
50MB | 0.92s / 1.2MB | 3.96s / 111MB | 42.33s / 880MB | 3.69s / 98MB | 4.29s / 292MB | 3.74s / 195MB | |
100MB | 1.87s / 1.2MB | 6.94s / 111MB | 02:07m / 1,8GB | 7.19s / 225MB | 7.96s / 547MB | 6.56s / 285MB | |
200MB | 3.53s / 1.2MB | 12.27s / 111MB | timeout | 13.60s / 244MB | 14.30s / 973MB | 11.82s / 480MB | |
Q6 | 10MB | 0.34s / 1.2MB | n/a | 7.66s / 240MB | 0.98s / 29MB | 1.73s / 82MB | 1.56s / 33MB |
50MB | 1.68s / 1.2MB | n/a | 57.98s / 1.2GB | 5.06s / 111MB | 5.78s / 292MB | 6.13s / 169MB | |
100MB | 3.33s / 1.2MB | n/a | 5:08m / 2GB | 9.94s / 253MB | 10.85s / 622MB | 11.74s / 484MB | |
200MB | 6.42s / 1.2MB | n/a | timeout | 19.95s / 337MB | 20.14s / 1.2GB | 20.33s / 805MB | |
Q8 | 10MB | 13.15s / 9.8MB | 18.04s / 128MB | 01:04m / 377MB | 02:56m / 407MB | 6.61s / 145MB | 9.89s / 148MB |
50MB | 05:13m / 43MB | 06:51m / 169MB | 33:08m / 1.8GB | 03:26m / 1.35GB | 02:02m / 352MB | 03:38m / 265MB | |
100MB | 22:07m / 86MB | 27:01m / 216MB | timeout | – | 08:39m / 650MB | 14:27m / 397MB | |
200MB | timeout | timeout | timeout | – | 32:43m / 1.15GB | 52:05m / 636MB | |
Q13 | 10MB | 0.17s / 1.2MB | 1.60s / 52MB | 5.92s / 182MB | 0.80s / 31MB | 1.53s / 48MB | 1.26s / 28MB |
50MB | 0.85s / 1.2MB | 3.98s / 111MB | 43.91s / 899MB | 3.64s / 98MB | 4.45s / 292MB | 3.85s / 195MB | |
100MB | 1.69s / 1.2MB | 7.00s / 111MB | 02:04m / 1.8GB | 7.34s / 224MB | 8.35s / 547MB | 6.81s / 285MB | |
200MB | 3.24s / 1.2MB | 12.33s / 111MB | timeout | 13.52s / 271MB | 15.02s / 1.05GB | 12.30s / 480MB | |
Q20 | 10MB | 0.25s / 1.2MB | 1.65s / 48MB | 6.95s / 215MB | 0.85s / 34MB | 1.65s / 62MB | 1.43s / 39MB |
50MB | 1.24s / 1.2MB | 4.19s / 111MB | 53.08s / 1,5GB | 4.17s / 120MB | 4.90s / 292MB | 4.18s / 195MB | |
100MB | 2.48s / 1.2MB | 7.37s / 111B | 03:14m / 2GB | 8.47s / 247MB | 9.13s / 622MB | 8.71s / 350MB | |
200MB | 4.74s / 1.2MB | 13.14s / 111MB | timeout | 16.40s / 296MB | 16.58s / 1.15GB | 15.80s / 628MB |
XMark Queries
Note: All following (XMark) queries were taken from XMark – An XML Benchmark Project and modified to match the GCX v1.0β supported XQuery fragment.
XMark Q1
<query1> {
for $site in /site return
for $people in $site/people return
for $person in $people/person return
if ($person/person_id="person0")
then <result> {$person/name} </result>
else ()
} </query1>
XMark Q6
<query6> {
for $site in //site return
for $regions in $site/regions return
$regions//item
} </query6>
XMark Q8
<query8> {
for $site in /site return
for $people in $site/people return
for $person in $people/person return
<item> {
(
<person> {$person/name} </person>,
<items_bought> {
for $site2 in /site return
for $cas in $site2/closed_auctions return
for $ca in $cas/closed_auction return
for $buyer in $ca/buyer return
if ($buyer/buyer_person=$person/person_id)
then <result> {$ca} </result>
else ()
} </items_bought>
)
} </item>
} </query8>
XMark Q13
<query13> {
for $site in /site return
for $regions in $site/regions return
for $australia in $regions/australia return
for $item in $australia/item return
<item> {
(
<name> {$item/name} </name>,
<desc> {$item/description} </desc>
)
} </item>
} </query13>
XMark Q20
<query20> {
for $site in /site return
for $people in $site/people return
for $person in $people/person return
if (fn:not(fn:exists($person/person_income)))
then $person
else ()
} </query20>
Last updated: 2009-11-11