University of Leicester

informatics

Data Structures

Associated Researchers: Prof. Raman, Dr Naila Rahman.

Ways of representing and efficiently manipulating data are central to many computer programs. Over the years, computer scientists have studied many different data structures for representing and manipulating data within computer main and secondary memory. These investigations have shown, both in practice and in theory, that the choice of data structures often has a considerable effect on algorithmic performance. Our work in data structures covers a wide variety of topics. The work has theoretical (asymptotic performance) and applied (practical performance) aspects. The major themes here are:

Exploiting word-level parallelism: Even ordinary "sequential" processors incorporate parallelism of various kinds. One fundamental form of parallelism—word-level parallelism (WLP)—exists at the level of hardware circuits, and is used by a processor to say add 64-bit numbers in one step. Recently there has been a great deal of activity in the underlying theory of trans-dichotomous RAM algorithms. For instance, our work in this area shows that by exploiting WLP, n keys can be sorted in O(n log log n) steps, rather than the O(n log n) steps achieved by classical comparison-based algorithms.

Data Structures for Hierarchical Memory: Computer memory is arranged hierarchically. The lowest levels of the hierarchy (such as magnetic disk or tape) have the greatest capacity, but are several orders of magnitude slower than the higher levels (such as CPU registers). For sufficiently large problem instances, the input will need to be stored in a relatively low level of the memory hierarchy, and algorithms and data structures have to be carefully designed to prevent unnecessary accesses to the lower levels of the hierarchy. A classical example is a B-tree used for large databases on magnetic disk.

A slight disadvantage of algorithms optmised for the memory hierarchy is that they typically need to have platform-specific parameters coded into them such as the sizes of the various levels of the hierarchy. For example a B-tree's fan-out depends upon the size of a disk block. An interesting new development is that of cache-oblivious algorithms which give optimal performance in hierarchical memories without explicitly using any such parameters, but instead resorting to novel recursive algorithms and data layouts.

Succinct Data Structures: Conventional wisdom holds that memory is "cheap", and so one need not make any but the most high-level optimisations to reduce the memory requirements of data structures (PC users fed up with "code bloat" may beg to differ!). However, there are many situations where this is not possible. For instance, for indexing "unstructured" data such as human genome data, one requires a powerful indexing data structure known as the suffix tree. To index a string of n symbols, a suffix tree requires O(n) words of memory, which seems reasonable. However, in practice the index would be several times larger than the string that is being indexed, a significant problem if one wishes to distribute indexable data on a medium with limited capacity such as a CD-ROM. Succinct data structures reduce the space usage to the information-theoretic minimum. Succinct data structures are now known for several problems. The current challenge is how to maintain succinctness while making data structures dynamic (i.e. able to process changes rapidly).

© University of Leicester 15th December 2000. Last modified: 17th April 2005, 22:49:00.
Informatics Web Maintainer. This document has been approved by the Head of Department.