Databases and Information Systems
Sie sind hier: Startseite Research Former Projects

The GCX XQuery Engine – Benchmark Results for GCX v1.0β

G(arbage) C(collected) X(query) Engine – An open source in-memory XQuery engine

The GCX engine is an in-memory XQuery engine designed for memory-efficient XQuery evaluation against large XML documents. The C++-prototype, which was released in v1.0β, supports a powerful fragment of the XQuery language. The following experiments are part of our publication at ICDE 2007.

Experiments with the XMark Benchmark

We measured the performance of the GCX egnine v1.0β on benchmark data from XMark – An XML Benchmark Project. To this end, we generated XML documents of sizes between 10MB and 200MB with the XMark data generator xmlgen. As the GCX engine does not yet support the full XQuery standard, we modified the XMark benchmarks as follows:

  • The GCX engine does not yet support XML attributes. Consequently, all attributes in the XML documents were rewritten to subelements. For instance, an opening tag <book id="1"> is rewritten to <book><book_id>1</book_id> (a 1MB sample XML document is available here).
  • The GCX engine v1.0β does not support aggregation (the support for aggregation has been added in v2.0!). Hence, we slightly modified selected XMark queries, as shown in the XMark Queries section. Each query is related to the same-numbered XMark query, but due to the rewriting some of them yield different results than their original counterparts.

In our experiments, all XQuery engines evaluated the same rewritten queries on exactly the same input XML documents.

Reference Implementations

The GCX engine has two main characteristics: It is an in-memory XQuery engine and it is geared towards streaming XQuery evaluation. With this in mind, we chose the following reference implementations.

  • The FluXQuery engine (programming laguage: Java) is probably the most natural choice for a reference implementation: it is also a main memory XQuery engine geared towards XML stream processing and it implements a similar XQuery fragment as the GCX engine v1.0β. Moreover, the FluXQuery engine is able to exploit schema information which come from a Document Type Definition (DTD). Consequently, the FluXQuery engine was provided with the XMark DTD in our experiments.
  • The in-memory query engines Galax v0.6.8 (programming language: Objective CAML), Qizx/open v1.1 (programming language: Java) and Saxon v8.7.1 (programming laguage: Java) implement the full XQuery standard. While Galax has not been designed with XML stream processing in mind, it is often consulted in XQuery benchmarks and – for this reason – also included here. Note that the static projection of Galax could not be made to work in our experiments.
  • Finally, we chose MonetDB v4.12.0/XQuery v0.12.0, a mature XML database system. As a secondary-storage implementation, MonetDB can make use of index structures to speed up query evaluation, which is not done by streaming in-memory XQuery engines. On the other side, MonetDB XQuery stores the entire data physically before query evaluation. To account for the fact that the GCX engine and the other main memory engines read the complete input XML document for each query evaluation, in each run we forced the MonetDB server to reload the complete XML document (and include this document loading time in our time measurements).

Execution Platform

We ran our experiments on a 3GHz CPU Intel Pentium IV with 2GB RAM, running SuSe Linux 10.0. All Java-based systems were executed using J2RE v1.4.2.

The focus of the benchmarks was primarily on main memory consumption, but we also consider query execution time. Time is given either in seconds (abbreviated with "s") (e.g. 1.59s means 1 second and 59 millisecond) or in minutes (abbreviated with "m") (e.g. 02:07m means 2 minutes and 7 seconds). Memory consumption is given in megabytes (abbreviated with "MB") or gigabytes (abbreviated with "GB"). The main memory consumption was measured with the Linux "top" command. For each system and query we set a timeout of one hour. For each system and size of the input XML document, we measured the high watermark of non-swapped memory consumption, and the total query evaluation time. "Not available" (abbreviated with "n/a") indicates that the query could not be expressed in the language supported by the specific engine, while a dash (abbreviated with "") denotes failure, e.g. caused by segmentation faults. With the Java-based engines, we could observe that due to effects caused by automatic memory management and the Java Virtual Machine, memory consumption often increased with the XML document size even though the buffer size remained constant (e.g. in case of the FluXQuery engine).

Benchmark Results

The table below summarizes the runtime and memory consumption of the GCX engine (v1.0β) compared to the other XQuery engines. Note that the benchmark results are not permanently kept up to date. The behavior (needed runtime and memory consumption) of the current version (v2.1) of the GCX engine might differ from the benchmark results given in the following table.

Query/Engine XML document size GCX
FluXQuery Galax
Q1 10MB 0.18s / 1.2MB 1.59s / 50MB 5.45s / 186MB 0.86s / 30MB 1.48s / 80MB 1.20s / 38MB
50MB 0.92s / 1.2MB 3.96s / 111MB 42.33s / 880MB 3.69s / 98MB 4.29s / 292MB 3.74s / 195MB
100MB 1.87s / 1.2MB 6.94s / 111MB 02:07m / 1,8GB 7.19s / 225MB 7.96s / 547MB 6.56s / 285MB
200MB 3.53s / 1.2MB 12.27s / 111MB timeout 13.60s / 244MB 14.30s / 973MB 11.82s / 480MB
Q6 10MB 0.34s / 1.2MB n/a 7.66s / 240MB 0.98s / 29MB 1.73s / 82MB 1.56s / 33MB
50MB 1.68s / 1.2MB n/a 57.98s / 1.2GB 5.06s / 111MB 5.78s / 292MB 6.13s / 169MB
100MB 3.33s / 1.2MB n/a 5:08m / 2GB 9.94s / 253MB 10.85s / 622MB 11.74s / 484MB
200MB 6.42s / 1.2MB n/a timeout 19.95s / 337MB 20.14s / 1.2GB 20.33s / 805MB
Q8 10MB 13.15s / 9.8MB 18.04s / 128MB 01:04m / 377MB 02:56m / 407MB 6.61s / 145MB 9.89s / 148MB
50MB 05:13m / 43MB 06:51m / 169MB 33:08m / 1.8GB 03:26m / 1.35GB 02:02m / 352MB 03:38m / 265MB
100MB 22:07m / 86MB 27:01m / 216MB timeout 08:39m / 650MB 14:27m / 397MB
200MB timeout timeout timeout 32:43m / 1.15GB 52:05m / 636MB
Q13 10MB 0.17s / 1.2MB 1.60s / 52MB 5.92s / 182MB 0.80s / 31MB 1.53s / 48MB 1.26s / 28MB
50MB 0.85s / 1.2MB 3.98s / 111MB 43.91s / 899MB 3.64s / 98MB 4.45s / 292MB 3.85s / 195MB
100MB 1.69s / 1.2MB 7.00s / 111MB 02:04m / 1.8GB 7.34s / 224MB 8.35s / 547MB 6.81s / 285MB
200MB 3.24s / 1.2MB 12.33s / 111MB timeout 13.52s / 271MB 15.02s / 1.05GB 12.30s / 480MB
Q20 10MB 0.25s / 1.2MB 1.65s / 48MB 6.95s / 215MB 0.85s / 34MB 1.65s / 62MB 1.43s / 39MB
50MB 1.24s / 1.2MB 4.19s / 111MB 53.08s / 1,5GB 4.17s / 120MB 4.90s / 292MB 4.18s / 195MB
100MB 2.48s / 1.2MB 7.37s / 111B 03:14m / 2GB 8.47s / 247MB 9.13s / 622MB 8.71s / 350MB
200MB 4.74s / 1.2MB 13.14s / 111MB timeout 16.40s / 296MB 16.58s / 1.15GB 15.80s / 628MB

XMark Queries

Note: All following (XMark) queries were taken from XMark – An XML Benchmark Project and modified to match the GCX v1.0β supported XQuery fragment.

XMark Q1

<query1> {
  for $site in /site return
    for $people in $site/people return
      for $person in $people/person return
        if ($person/person_id="person0")
          then <result> {$person/name} </result>
          else ()
} </query1>

XMark Q6

<query6> {
  for $site in //site return
    for $regions in $site/regions return
} </query6>

XMark Q8

<query8> {
  for $site in /site return
    for $people in $site/people return
      for $person in $people/person return
        <item> {
            <person> {$person/name} </person>,
            <items_bought> {
              for $site2 in /site return
                for $cas in $site2/closed_auctions return
                  for $ca in $cas/closed_auction return
                    for $buyer in $ca/buyer return
                      if ($buyer/buyer_person=$person/person_id)
                        then <result> {$ca} </result>
                        else ()
              } </items_bought>
        } </item>
} </query8>

XMark Q13

<query13> {
  for $site in /site return
    for $regions in $site/regions return
      for $australia in $regions/australia return
        for $item in $australia/item return
          <item> {
              <name> {$item/name} </name>,
              <desc> {$item/description} </desc>
          } </item>
} </query13>

XMark Q20

<query20> {
  for $site in /site return
    for $people in $site/people return
      for $person in $people/person return
        if (fn:not(fn:exists($person/person_income)))
          then $person
          else ()
} </query20>

Last updated: 2009-11-11