GCX Project Page • Download • Manual • Documentation • License • Fragment XQ • Benchmark • FAQ

The GCX XQuery Engine – Benchmark Results for GCX v1.0β

G(arbage) C(collected) X(query) Engine – An open source in-memory XQuery engine

The GCX engine is an in-memory XQuery engine designed for memory-efficient XQuery evaluation against large XML documents. The C++-prototype, which was released in v1.0β, supports a powerful fragment of the XQuery language. The following experiments are part of our publication at ICDE 2007.

Experiments with the XMark Benchmark

We measured the performance of the GCX egnine v1.0β on benchmark data from XMark – An XML Benchmark Project. To this end, we generated XML documents of sizes between 10MB and 200MB with the XMark data generator xmlgen. As the GCX engine does not yet support the full XQuery standard, we modified the XMark benchmarks as follows:

The GCX engine does not yet support XML attributes. Consequently, all attributes in the XML documents were rewritten to subelements. For instance, an opening tag <book id="1"> is rewritten to <book><book_id>1</book_id> (a 1MB sample XML document is available here).
The GCX engine v1.0β does not support aggregation (the support for aggregation has been added in v2.0!). Hence, we slightly modified selected XMark queries, as shown in the XMark Queries section. Each query is related to the same-numbered XMark query, but due to the rewriting some of them yield different results than their original counterparts.

In our experiments, all XQuery engines evaluated the same rewritten queries on exactly the same input XML documents.

Reference Implementations

The GCX engine has two main characteristics: It is an in-memory XQuery engine and it is geared towards streaming XQuery evaluation. With this in mind, we chose the following reference implementations.

The FluXQuery engine (programming laguage: Java) is probably the most natural choice for a reference implementation: it is also a main memory XQuery engine geared towards XML stream processing and it implements a similar XQuery fragment as the GCX engine v1.0β. Moreover, the FluXQuery engine is able to exploit schema information which come from a Document Type Definition (DTD). Consequently, the FluXQuery engine was provided with the XMark DTD in our experiments.
The in-memory query engines Galax v0.6.8 (programming language: Objective CAML), Qizx/open v1.1 (programming language: Java) and Saxon v8.7.1 (programming laguage: Java) implement the full XQuery standard. While Galax has not been designed with XML stream processing in mind, it is often consulted in XQuery benchmarks and – for this reason – also included here. Note that the static projection of Galax could not be made to work in our experiments.
Finally, we chose MonetDB v4.12.0/XQuery v0.12.0, a mature XML database system. As a secondary-storage implementation, MonetDB can make use of index structures to speed up query evaluation, which is not done by streaming in-memory XQuery engines. On the other side, MonetDB XQuery stores the entire data physically before query evaluation. To account for the fact that the GCX engine and the other main memory engines read the complete input XML document for each query evaluation, in each run we forced the MonetDB server to reload the complete XML document (and include this document loading time in our time measurements).

Execution Platform

We ran our experiments on a 3GHz CPU Intel Pentium IV with 2GB RAM, running SuSe Linux 10.0. All Java-based systems were executed using J2RE v1.4.2.

The focus of the benchmarks was primarily on main memory consumption, but we also consider query execution time. Time is given either in seconds (abbreviated with "s") (e.g. 1.59s means 1 second and 59 millisecond) or in minutes (abbreviated with "m") (e.g. 02:07m means 2 minutes and 7 seconds). Memory consumption is given in megabytes (abbreviated with "MB") or gigabytes (abbreviated with "GB"). The main memory consumption was measured with the Linux "top" command. For each system and query we set a timeout of one hour. For each system and size of the input XML document, we measured the high watermark of non-swapped memory consumption, and the total query evaluation time. "Not available" (abbreviated with "n/a") indicates that the query could not be expressed in the language supported by the specific engine, while a dash (abbreviated with "–") denotes failure, e.g. caused by segmentation faults. With the Java-based engines, we could observe that due to effects caused by automatic memory management and the Java Virtual Machine, memory consumption often increased with the XML document size even though the buffer size remained constant (e.g. in case of the FluXQuery engine).

Benchmark Results

The table below summarizes the runtime and memory consumption of the GCX engine (v1.0β) compared to the other XQuery engines. Note that the benchmark results are not permanently kept up to date. The behavior (needed runtime and memory consumption) of the current version (v2.1) of the GCX engine might differ from the benchmark results given in the following table.

Query/Engine	XML document size	GCX v1.0β	FluXQuery	Galax v0.6.8	MonetDB v4.12.0 XQuery v0.12.0	Saxon v8.7.1	Qizx/open v1.1
Q1	10MB	0.18s / 1.2MB	1.59s / 50MB	5.45s / 186MB	0.86s / 30MB	1.48s / 80MB	1.20s / 38MB
	50MB	0.92s / 1.2MB	3.96s / 111MB	42.33s / 880MB	3.69s / 98MB	4.29s / 292MB	3.74s / 195MB
	100MB	1.87s / 1.2MB	6.94s / 111MB	02:07m / 1,8GB	7.19s / 225MB	7.96s / 547MB	6.56s / 285MB
	200MB	3.53s / 1.2MB	12.27s / 111MB	timeout	13.60s / 244MB	14.30s / 973MB	11.82s / 480MB
Q6	10MB	0.34s / 1.2MB	n/a	7.66s / 240MB	0.98s / 29MB	1.73s / 82MB	1.56s / 33MB
	50MB	1.68s / 1.2MB	n/a	57.98s / 1.2GB	5.06s / 111MB	5.78s / 292MB	6.13s / 169MB
	100MB	3.33s / 1.2MB	n/a	5:08m / 2GB	9.94s / 253MB	10.85s / 622MB	11.74s / 484MB
	200MB	6.42s / 1.2MB	n/a	timeout	19.95s / 337MB	20.14s / 1.2GB	20.33s / 805MB
Q8	10MB	13.15s / 9.8MB	18.04s / 128MB	01:04m / 377MB	02:56m / 407MB	6.61s / 145MB	9.89s / 148MB
	50MB	05:13m / 43MB	06:51m / 169MB	33:08m / 1.8GB	03:26m / 1.35GB	02:02m / 352MB	03:38m / 265MB
	100MB	22:07m / 86MB	27:01m / 216MB	timeout	–	08:39m / 650MB	14:27m / 397MB
	200MB	timeout	timeout	timeout	–	32:43m / 1.15GB	52:05m / 636MB
Q13	10MB	0.17s / 1.2MB	1.60s / 52MB	5.92s / 182MB	0.80s / 31MB	1.53s / 48MB	1.26s / 28MB
	50MB	0.85s / 1.2MB	3.98s / 111MB	43.91s / 899MB	3.64s / 98MB	4.45s / 292MB	3.85s / 195MB
	100MB	1.69s / 1.2MB	7.00s / 111MB	02:04m / 1.8GB	7.34s / 224MB	8.35s / 547MB	6.81s / 285MB
	200MB	3.24s / 1.2MB	12.33s / 111MB	timeout	13.52s / 271MB	15.02s / 1.05GB	12.30s / 480MB
Q20	10MB	0.25s / 1.2MB	1.65s / 48MB	6.95s / 215MB	0.85s / 34MB	1.65s / 62MB	1.43s / 39MB
	50MB	1.24s / 1.2MB	4.19s / 111MB	53.08s / 1,5GB	4.17s / 120MB	4.90s / 292MB	4.18s / 195MB
	100MB	2.48s / 1.2MB	7.37s / 111B	03:14m / 2GB	8.47s / 247MB	9.13s / 622MB	8.71s / 350MB
	200MB	4.74s / 1.2MB	13.14s / 111MB	timeout	16.40s / 296MB	16.58s / 1.15GB	15.80s / 628MB

XMark Queries

Note: All following (XMark) queries were taken from XMark – An XML Benchmark Project and modified to match the GCX v1.0β supported XQuery fragment.

XMark Q1

<query1> {
  for $site in /site return
    for $people in $site/people return
      for $person in $people/person return
        if ($person/person_id="person0")
          then <result> {$person/name} </result>
          else ()
} </query1>

XMark Q6

<query6> {
  for $site in //site return
    for $regions in $site/regions return
      $regions//item
} </query6>

XMark Q8

<query8> {
  for $site in /site return
    for $people in $site/people return
      for $person in $people/person return
        <item> {
          (
            <person> {$person/name} </person>,
            <items_bought> {
              for $site2 in /site return
                for $cas in $site2/closed_auctions return
                  for $ca in $cas/closed_auction return
                    for $buyer in $ca/buyer return
                      if ($buyer/buyer_person=$person/person_id)
                        then <result> {$ca} </result>
                        else ()
              } </items_bought>
          )
        } </item>
} </query8>

XMark Q13

<query13> {
  for $site in /site return
    for $regions in $site/regions return
      for $australia in $regions/australia return
        for $item in $australia/item return
          <item> {
            (
              <name> {$item/name} </name>,
              <desc> {$item/description} </desc>
            )
          } </item>
} </query13>

XMark Q20

<query20> {
  for $site in /site return
    for $people in $site/people return
      for $person in $people/person return
        if (fn:not(fn:exists($person/person_income)))
          then $person
          else ()
} </query20>

Last updated: 2009-11-11