University of Leicester

computer science

The SiXML Project


Overview


The aim of this project is to provide libraries for XML developers to be able to rapidly process XML documents using a very small memory footprint. Unlike XML compression technologies, the aim of SiXML is to simultaneously achieve the small memory footprint of compressed XML documents and XML processing speeds that are similar to, if not better than, standard XML libraries.

In other words, XML can processed fast and without bloat. This surprising outcome is a result of a long-term research program funded by various funding agencies. You can read more about some of the underlying principles.

What's in a name?

SiXML stands for Succinct indexable XML, and is so named because SiXML is based on succinct data structures. However, it is also uses very little computer memory, and is not verbose. The word "indexable" indicates that one can operate conveniently on XML documents stored in SiXML.


SiXML Functionality


SiXML 1.2

(To be released Spring 2012.) Consists of the following components:

  • SiXDOM 1.2, an update of SDOM 1.0 (in C++) with improved parsing performance.
  • DOM API (Level 2 and partially 3)
  • Only contains the static DOM methods
  • Ported to 64-bit
  • SWIG bindings

SDOM 1.0

C++ library supporing:

  • DOM Level 2 and partial Level 3 implementation, with namespace support.
  • DOM Treewalker.

Note: At the moment, none of the above libraries support modification of the in-memory representation of the document.


SiXML Performance


SDOM 1.0 stores an XML document in-memory, using less memory than the (on-disk) size of the file. Depending on whether textual data is stored compressed or not, the amount of main memory used by SDOM 1.0 varies from 15% to 30% of the file size (textual data stored compressed) and from 40% to 85% of the file size (textual data stored uncompressed).

Particularly important is the speed: traversing the document, accessing attributes and textual data is typically only about twice as slow as Xerces implemented in C++. Of course, for larger documents, where Xerces strains the memory capacity of the computer, SDOM is much faster.

More details can be found in the papers below.


Downloads


SiXML 1.2

To be released in Spring 2012.

SDOM 1.0

Download libraries (version 1.0).


Funding


Funding specifically for SiXML has been provided by the University of Leicester's Enterprise and Business Development Office (EBDO):

  • Winner, "Technology Disclosure" competition, 2008.
  • "Pump Priming" money for new IP projects, March--July 2010.

SiXML is also partially supported by the Department of Computer Science at the University of Leicester.

The principles behind SiXML were developed in a series of research projects.


Further Reading


Some of the ideas behind SIXML can be found here.

Papers

  1. O. Delpratt
    Space Efficient In-Memory Representation of XML Documents. [PDF]
    PhD Thesis, 2008, University of Leicester.
  2. O. Delpratt, R. Raman and N. Rahman.
    Engineering Succinct DOM. [PDF]
    In Proceedings of the 11th International Conference on Extending Database Technology (EDBT), Nantes, France, March 25--29, 2008.
    DOI:10.1145/1353343.1353354
  3. O. Delpratt, N. Rahman and R. Raman.
    Compressed Prefix Sums. [PDF]
    In Proceedings of the 33rd International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM 2007), Harrachov, Czech Republic, January 20--26, 2007. Proceedings are in the Springer Lecture Notes in CS series v.4362, pp. 235-247. © Springer.
    DOI:10.1007/978-3-540-69507-3_19
  4. O. Delpratt, N. Rahman and R. Raman.
    Engineering the LOUDS Succinct Tree Representation. [PDF]
    In Proceedings of the 5th International Workshop on Experimental Algorithms (WEA 2006), Cala Galdana, Menorca, Spain, May 24--27, 2006. Proceedings are in the Springer Lecture Notes in CS series v.4007, pp. 134-145. © Springer.
    DOI: 10.1007/11764298_12
  5. R. F. Geary, N. Rahman, R. Raman and V. Raman.
    A simple optimal representation for balanced parentheses. [PDF]
    Theoretical Computer Science 368 (2006), pp. 231-246.
    DOI: 10.1016/j.tcs.2006.09.014

Poster Presentations

| [University Home]|[Faculty of Science]|[MCS Home]|[CS Home]||[University Index A-Z]|[University Search]|[University Help]|

Author: Stelios Joannou (sj148 at mcs.le.ac.uk), T: +44 (0)116 252 3883.
© University of Leicester February 2012. Last modified: 15th February 2012, 08:50:03.
CS Web Maintainer. This document has been approved by the Head of Department.