The GCX XQuery Engine – FAQs
Overview:
What is GCX and what does "GCX" stand for?What is an in-memory XQuery engine?
What are the requirements for in-memory XQuery engines?
Which fragment of XQuery 1.0 is currently supported by GCX?
Will GCX be extended to support the full XQuery standard?
Will there be new features for GCX?
How do I install the GCX engine?
Are there any example queries?
Is GCX open source?
What is the GCX approach to query evaluation?
Where can I get help?
I found a bug. Where can I submit it?
What is GCX and what does "GCX" stand for?
GCX is an in-memory XQuery engine designed for the memory-efficient evaluation of XQuery expressions against large XML documents. It has orginally been developed from the Saarland University Database Group as a research project and is currently maintained at Freiburg University. The main focus lies on the combination of static and dynamic analysis to reduce the amount of buffering or, spoken more generally, the main memory consumption at runtime. To reach this goal, GCX loads only those parts of the XML document into the buffer that are relevant for query evaluation (in the style of [1]) and, as a major improvement over current in-memory XQuery processing strategies, keeps track of the buffered parts of the input XML document. To reach this second goal, GCX implements a system similar to garbage collection by reference counting, which ensures that buffered parts of the document that have become irrelevant to query evaluation can be purged early on [3]. Accounting to this novel strategy, the name "GCX" stands for "Garabage Collected XQuery".
What is an in-memory XQuery engine?
In contrast to XQuery engines with an underlying physical database, in-memory XQuery engines evaluate queries completely in main memory, i.e. without writing (parts of) the data or intermediate results to disk. In-memory evaluation is often useful in streaming scenarios, where data arrives at very high rates and is not intended to be reused again. In such scenarios, writing data to the hard disk is often unfeasible, because hard disk access is very slow compared to main memory processing.
What are the requirements for in-memory XQuery engines?
In practice, main memory resources are strongly limited. Keeping the amount of buffered data as small as possible is not only necessary to avoid swapping, but also enables for fast search in buffers, since a smaller amount of data has to be scanned when evaluating parts of the query, e.g. path expression.
Which fragment of XQuery 1.0 is currently supported by GCX?
GCX version 2.1 currently supports a variant of composition-free XQuery [2]. Syntactically, GCX closely follows the official XQuery W3C Recommendation [4] (even though it does not implement the full standard) . The current version v2.1 of GCX supports the following features.
- arbitrary (well-formed) XML elements (with or without PCDATA content)
- string constants in output or conditions
- numeric constants in output or conditions
- aggregate function expressions in output or conditions supporting
- standard functions: fn:sum, fn:avg, fn:min, fn:max and fn:count
- non-standard functions: fn:stddev_samp, fn:stddev_pop, fn:var_samp, fn:var_pop and fn:median
- rounding function expressions in output or conditions supporting
- standard functions: fn:ceiling, fn:floor, fn:round and fn:round-half-to-even
- non-standard functions: fn:abs, fn:cover and fn:truncate
- arbitrarily deep-nested sequences of expressions
- nested FWR (for-where-return) expressions
- if-then-else expressions
- conditions support
- conjunctions: and, or
- functions: fn:not, fn:exists, fn:empty, fn:true, fn:false, all aggregate function expressions and all rounding function expressions
- relative operators: <, ≤, =, ≥, >, ≠
- variables defined by FWR expressions (no let-clause support) in output or conditions (with or without multi-step path expressions)
- multi-step path expressions (arbitrarily length) with (optional) fn:doc function expression for specifying absolute paths using
- axis: / (child::) or // (descendant::)
- node tests: node(), text(), wildcard (∗) or a tagname
- comment expressions "above" a query (not supported inside a query)
A complete overview of the supported constructs for the current GCX release (v2.1) can be found at Fragment XQ. A look at our sample queries (distributed with all GCX bundles) from the download page may also be helpful.
Will GCX be extended to support the full XQuery standard?
We are working on extensions of the GCX engine, but as it goes with research projects this will take time. Please note that there are definitely no plans to extend the GCX engine to support the full XQuery standard.
Will there be new features for GCX?
Possibly yes. At the time being there are some more ideas and features that might be implemented in the future.
How do I install the GCX engine?
There are two ways to get started with the GCX engine. The first is to use one of the binaries, which are available for Linux, Mac OS and Windows. Simply download the binary that corresponds to your operating system and execute it from command line. Alternatively, e.g. if no binary for your operating system/architecture is available, you may compile the GCX engine from source. To do that download the source code (bundle) and follow the instructions in the online version of the manual.
Are there any example queries?
Yes, all GCX bundles contain a set of sample queries with corresponding example XML documents. All queries are specified in the standard XQuery syntax [4] (but please note that GCX does not support full XQuery 1.0 standard). If you want to write queries by your own, you may consult the overview of supported constructs for the current GCX release (v2.1) at Fragment XQ.
Is GCX open source?
Yes, the source code is available from our download page. The GCX engine is licensed with the Berkeley Software Distribution (BSD) license, so you are also allowed to reuse and change the source code.
What is the GCX approach to query evaluation?
The static analysis phase consists in a static query analysis phase to identify the parts of the input XML document that are relevant to query evaluation.
While reading the XML document, irrelevant parts are projected away, thus not even loaded into the buffer.
This technique usually reduces the amount of buffered data significantly (see also [1]).
Beyond this static part, even at runtime the GCX engine permanently tries to minimize the amount of buffered data. XML tokens in the buffer are assigned roles, which reflect their future relevance to query evaluation.
For instance, tokens refered to by a condition of an if-expression are assigned a role. As soon as the GCX engine has evaluated the (if-)expression, the token is notified about the loss of the role.
Tokens that have lost all their roles have become irrelevant to query evaluation and can be removed from the buffer early on.
The combination of static and dynamic analysis enables memory-efficient query evaluation.
The benchmark results demonstrate the benefit that can be obtained by this technique.
Detailed background information can also be found
in [3].
Where can I get help?
For feedback such as questions, comments, bug reports and feature requests please use one of the following GCX mailing lists
- http://lists.sourceforge.net/mailman/listinfo/gcx-engine-general
Mailing list for general discussion about GCX (general questions, comments, ...) - http://lists.sourceforge.net/mailman/listinfo/gcx-engine-support
Mailing list to ask questions about using and building GCX - http://lists.sourceforge.net/mailman/listinfo/gcx-engine-bugs
Mailing list for bug reports and discussion about bugs in GCX - http://lists.sourceforge.net/mailman/listinfo/gcx-engine-requests
Mailing list to request new or desired features for future releases
Alternatively, if you want to get in direct communication with us, feel free to contact
I found a bug. Where can I submit it?
In case you have found a bug in the GCX engine please use the http://lists.sourceforge.net/mailman/listinfo/gcx-engine-bugs mailing list or contact Michael Schmidt directly. A short description of the bug and, if possible, a minimalistic query and input XML document that can be used to reproduce the behavior would be helpful.
References
- [1] A. Marian, J. Siméon: Projecting XML Documents, In Proc. VLDB ‘03, pages 213-224
- [2] Christoph Koch: On the complexity of nonrecursive XQuery and functional query languages on complex values, ACM Transactions on Database Systems, 31(4), 2006
- [3] M. Schmidt, S. Scherzinger, C. Koch: Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming XQuery Evaluation, In Proc. ICDE 2007 [.pdf]
- [4] XQuery 1.0: An XML Query Language (W3C)
Last updated: 2009-11-11