Ways of representing and efficiently manipulating data are central to many computer programs. Over the years, computer scientists have studied many different data structures for representing and manipulating data within computer main and secondary memory. These investigations have shown, both in practice and in theory, that the choice of data structures often has a considerable effect on algorithmic performance. Our work in data structures covers a wide variety of topics. The work has theoretical (asymptotic performance) and applied (practical performance) aspects. The major themes here are:
Exploiting word-level parallelism: Even ordinary "sequential" processors incorporate parallelism of various kinds. One fundamental form of parallelism—word-level parallelism (WLP)—exists at the level of hardware circuits, and is used by a processor to say add 64-bit numbers in one step. Recently there has been a great deal of activity in the underlying theory of trans-dichotomous RAM algorithms. For instance, our work in this area shows that by exploiting WLP, n keys can be sorted in O(n log log n) steps, rather than the O(n log n) steps achieved by classical comparison-based algorithms.
Data Structures for Hierarchical Memory:
Computer memory is arranged hierarchically. The lowest levels
of the hierarchy (such as magnetic disk or tape) have the greatest
capacity, but are several orders of magnitude slower than the higher levels
(such as CPU registers). For sufficiently large problem instances,
the input will need to be stored in a relatively low level of the
memory hierarchy, and algorithms and data structures have to be
carefully designed to prevent unnecessary accesses to the lower
levels of the hierarchy. A classical example is a B-tree used
for large databases on magnetic disk.
Succinct Data Structures: Conventional wisdom holds that memory is "cheap", and so one need not make any but the most high-level optimisations to reduce the memory requirements of data structures (PC users fed up with "code bloat" may beg to differ!). However, there are many situations where this is not possible. For instance, for indexing "unstructured" data such as human genome data, one requires a powerful indexing data structure known as the suffix tree. To index a string of n symbols, a suffix tree requires O(n) words of memory, which seems reasonable. However, in practice the index would be several times larger than the string that is being indexed, a significant problem if one wishes to distribute indexable data on a medium with limited capacity such as a CD-ROM. Succinct data structures reduce the space usage to the information-theoretic minimum. Succinct data structures are now known for several problems. The current challenge is how to maintain succinctness while making data structures dynamic (i.e. able to process changes rapidly).