Filtering and Transformation for High-Volume 
XML Message Brokering


Research Papers

Related Work





As XML message brokers built on YFilter are to be deployed in distributed wide-area environments, the number of users of such systems can easily grow into hundreds of thousands of them. A key challenge in such an environment is to efficiently search and process the huge set of queries for every incoming message. The techniques that YFilter uses to achieve efficiency and scalability aggressively exploit commonality among queries during message processing.  


Matching of path expressions is a core operation in XML query processing systems. In YFilter, we combine all path expressions into a single Nondeterministic Finite Automaton (NFA) where the common prefixes among path expressions are represented only once. In the runtime, the incremental expansion of the internal presentation of an XML message drives the execution of NFA. Processing of common prefixes among path expressions is completely shared during the execution .When the NFA execution reaches an accepting state, a path match is output for all path expressions represented by that accepting state. This NFA-based approach can be extended to also process predicates attached to path expressions. We have developed two alternatives to combining the NFA execution and predicate evaluation. One approach evaluates predicates as early as their addressed elements are matched, while the other delays predicate evaluation until the corresponding path expression has been entirely matched.

The results of our performance study show that this NFA-based approach is very efficient under various workloads. Using this approach, matching hundreds of thousands of distinct path expressions (without predicates) is sufficiently fast that, in many cases its cost is comparable to or even less than the cost of message parsing. When processing this number of path expressions with predicates, the approach that delays predicate evaluation is shown to be more efficient and can achieve a performance within a small constant of the parsing cost.


The transformation functionality is provided by a post-processing module built on top of the NFA-based path matching engine. To exploit sharing of path matching, we push path expressions in clauses in the FLWOR expressions down to the path matching engine, and build individual query plans to post-process the path matching engine output for the customized query results. We have developed techniques that differ in the degree to which they push work down into the path matching engine, and techniques that optimize the post-processing of the engine output. Our experimental results show that the best performance is provided by the technique that exploits the shared path matching to the fullest extent, when assisted with sophisticated DTD-based query optimizations.

We have also proposed an initial set of techniques that enable the sharing of the post-processing across queries. When the best performing technique described above is augmented with sharing post-processing, it achieves excellent scalability improvements, allowing the processing of a hundred thousand queries in less than half a second.