University of Leicester

informatics

The SiXML Project


Overview


The aim of this project is to provide libraries for XML developers to be able to rapidly process XML documents using a very small memory footprint. Unlike XML compression technologies, the aim of SiXML is to simultaneously achieve the small memory footprint of compressed XML documents and XML processing speeds that are similar to, if not better than, standard XML libraries.

In other words, XML can processed fast and without bloat. This surprising outcome is a result of a long-term research program funded by various funding agencies. You can read more about some of the underlying principles.

What's in a name?

SiXML stands for Succinct indexable XML, and is so named because SiXML is based on succinct data structures. However, it is also uses very little computer memory, and is not verbose. The word "indexable" indicates that one can operate conveniently on XML documents stored in SiXML.


SiXML Functionality


SiXDOM 1.2

Consists of the following components:

  • SiXDOM 1.2, an update of SDOM 1.0 (in C++) with improved parsing performance.
  • DOM API (Level 2 and partially 3).
  • Ported to 64-bit.
  • SWIG bindings for Java and Python.

The software can be downloaded here.

Note: At the moment, none of the above libraries support modification of the in-memory representation of the document.

SDOM 1.0

C++ library supporing:

  • DOM Level 2 and partial Level 3 implementation, with namespace support.
  • DOM Treewalker.

Note: At the moment, none of the above libraries support modification of the in-memory representation of the document.


SiXML Performance


SiXDOM 1.2 stores an XML document in-memory, using memory similar to the (on-disk) size of the file. It does not currently compress the textual data; were it to do so, the memory footprint would be even smaller.

For larger XML files (or for large collections of smaller XML files) the improved memory footprint gives a huge improvement in wall-clock time for both parsing and traversing, as described in the white paper ``XXML: Handling Extra-Large XML Documents'' below. Even when memory is not an issue, SiXML is only about twice as slow as Xerces.

Technical details can also be found in the papers below.


Downloads


SiXDOM 1.2

SiXDOM 1.2 can be downloaded from here. SiXDOM library has been compiled for 64-bit Linux machines.

SDOM 1.0

Download libraries (version 1.0).


Funding


A LGIP Intern, Andreas Poyias, will act as SiXDOM Technology Evangelist from October 2012 for six months.

Funding specifically for SiXML has been provided by the University of Leicester's Enterprise and Business Development Office (EBDO):

  • "Pump Priming" money for new IP projects, March--July 2010.
  • Winner, "Technology Disclosure" competition, 2008.

SiXML is also partially supported by the Department of Computer Science at the University of Leicester, in particular through the time devoted by Stelios Joannou towards developing SiXML.

The principles behind SiXML were developed in a series of research projects.


Further Reading


Some of the ideas behind SIXML can be found here.

Papers

  1. A. Poyias
    XXML: Handling Extra-Large XML Documents [PDF]
    SiXML Technology White Paper, 2013, University of Leicester.
  2. O. Delpratt
    Space Efficient In-Memory Representation of XML Documents. [PDF]
    PhD Thesis, 2008, University of Leicester.
  3. O. Delpratt, R. Raman and N. Rahman.
    Engineering Succinct DOM. [PDF]
    In Proceedings of the 11th International Conference on Extending Database Technology (EDBT), Nantes, France, March 25--29, 2008.
    DOI:10.1145/1353343.1353354
  4. O. Delpratt, N. Rahman and R. Raman.
    Compressed Prefix Sums. [PDF]
    In Proceedings of the 33rd International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM 2007), Harrachov, Czech Republic, January 20--26, 2007. Proceedings are in the Springer Lecture Notes in CS series v.4362, pp. 235-247. Springer.
    DOI:10.1007/978-3-540-69507-3_19
  5. O. Delpratt, N. Rahman and R. Raman.
    Engineering the LOUDS Succinct Tree Representation. [PDF]
    In Proceedings of the 5th International Workshop on Experimental Algorithms (WEA 2006), Cala Galdana, Menorca, Spain, May 24--27, 2006. Proceedings are in the Springer Lecture Notes in CS series v.4007, pp. 134-145. Springer.
    DOI: 10.1007/11764298_12
  6. R. F. Geary, N. Rahman, R. Raman and V. Raman.
    A simple optimal representation for balanced parentheses. [PDF]
    Theoretical Computer Science 368 (2006), pp. 231-246.
    DOI: 10.1016/j.tcs.2006.09.014

Poster Presentations

Author: Stelios Joannou (sj148 at mcs.le.ac.uk), T: +44 (0)116 252 3883.
University of Leicester February 2013. Last modified: 8th February 2013, 11:21:33.
Informatics Web Maintainer. This document has been approved by the Head of Department.