DATA STRUCTURES GALORE Jens Axel Søgaard
----------------------------------------------------------------------
_DATA STRUCTURES GALORE_ Jens Axel Søgaard
----------------------------------------------------------------------
----------------------------------------------------------------------
INTRODUCTION
----------------------------------------------------------------------
This library provides functional implementations of some commonly used
data structures. The data structures can be divided into:
- collections (_sets_, _bags_, _heaps_)
- associative collections (_finite maps_, _priority queues_)
- sequences (_stacks_, _queues_, _deques_)
[Although associative collections and sequences are included in the
PLanet-distribution, their implementations, though in working
order, should not be considered finished.]
The data structures are implemented in a purely functional way.
Inserting an element in, say, a collection will return a new data
structure containing both the element and all elements in the old
collection *without* destroying the old collection. The collections
can thus be used *persistently* opposed to the normal *ephemeral* way,
where an old collection is destroyed. This implies that the collection
is safe to use in multi threaded programs and web servlets.
Data structures can have multiple implementations, and each
implementation has its own module. A "no-fuss" module exists for each
data strucuture containing a reasonable default implemention for each
data structure is given - for those cases, where it is unimportant to
have specific time complexities.
The names of the individual operations have been carefully chosen to
have meaning for as many data structures as possible. Element
insertion is for example named "insert" in all data structures (some
data structures also provide alternative, more traditional
names). This choice makes it easy to replace an unfortunate choice of
data structure. Use (prefix ...) to avoid name conflicts working with
more than one data structure.
Many of the data structure implementations require an ordering of the
elements. The order is represented using compare function from
srfi-67.
All exported functions are checked using contracts, the error messages
are therefore considerably better than in the previous version of
this library.
Most algorithms are carefully explained by Chris Okasaki in the
delightful book "Purely Functional Data Structures".
[Terminology: bags are also called multi-sets; a heap is a priority
queue, where the priority is the element; in a priority queue the
element and priority are distinct; a deque is a double ended queue]
----------------------------------------------------------------------
INITIAL EXAMPLES
----------------------------------------------------------------------
EXAMPLE (Simple)
----------------
The operation empty makes a new heap with no elements.
;; A heap of integers
> (require (planet "heap.scm" ("soegaard" "galore.plt")))
> (find-min (insert 42 (insert 3 (heap 2 8 1))))
1
;; A leftist heap of strings
> (require (planet "leftist-heap.scm" ("soegaard" "galore.plt")))
> (find-min (insert "foo" (insert "bar" (empty))))
"bar"
;; Bags of integers
> (require (planet "bag.scm" ("soegaard" "galore.plt")))
> (elements (difference (union (bag 1 3 2)
(bag 1 2 4))
(bag 3)))
(4 2 2 1 1)
;; Sets of integers
(require (planet "set.scm" ("soegaard" "galore.plt")))
> (elements (difference (union (set 1 3 2)
(set 1 2 4))
(set 3)))
(1 2 4)
EXAMPLE (Heap Sort)
-------------------
A simple heap sort:
(require (planet "heap.scm" ("soegaard" "galore.plt")))
(define (heap-sort l)
(heap->list (list->heap l)))
(define (list->heap l)
; a more efficient list->heap is provided by the library
(foldl insert (empty) l))
(define (heap->list H)
(define (loop H l)
(if (empty? H)
(reverse l)
(loop (delete-min H) (cons (find-min H) l))))
(loop H '()))
(heap-sort '(3 1 4 1 5 9))
Eager comprehensions from srfi-42 are supported, so an alternative to
the above is:
(require (planet "heap.scm" ("soegaard" "galore.plt"))
(lib "42.ss" "srfi")
(lib "67.ss" "srfi"))
(define (heap-sort l)
(heap->list (list->heap l)))
(define (list->heap l)
(heap-ec default-compare (: x l) x))
(define (heap->list H)
(list-ec (: x H) x))
(heap-sort '(3 1 4 1 5 9))
EXAMPLE (User defined element type)
-----------------------------------
The above heaps used the compare function given by the parameter
current-compare to order the elements (because empty was given no
arguments). The start value of current-compare is default-compare,
which handles lists, booleans, characters, strings, symbols, numbers
and vectors.
To create a heap of user defined structs, we need to pass empty a
custom compare function.
(require (planet "heap.scm" ("soegaard" "galore.plt"))
(lib "67.ss" "srfi"))
(define-struct hiscore (name score) (make-inspector))
(define (hiscore-compare s1 s2)
(refine-compare
; highest scores first
(number-compare (hiscore-score s2) (hiscore-score s1))
; ties are sorted alphabetically after name
(string-compare (hiscore-name s1) (hiscore-name s2))))
(find-min (insert* (list (make-hiscore "Foo" 200)
(make-hiscore "Bar" 100)
(make-hiscore "Qux" 300))
(empty hiscore-compare)))
; => #(struct:hiscore "Qux" 300)
As an alternative one could extend the compare function
given by the current-compare parameter:
(let ([cmp (current-compare)])
(current-compare
(lambda (x1 x2)
(select-compare x1 x2
[hiscore? (hiscore-compare x1 x2)]
[else (cmp x1 x2)]))))
(find-min (insert* (list (make-hiscore "Foo" 200)
(make-hiscore "Bar" 100)
(make-hiscore "Qux" 300))
(empty)))
; => #(struct:hiscore "Qux" 300)
----------------------------------------------------------------------
COLLECTIONS
----------------------------------------------------------------------
Collections include sets, bags, and heaps. The following is a
summary of the operations collections have in common:
CONSTRUCTORS
------------
> empty : [cmp] -> col
return an object representing an empty X of elements order by the
compare function cmp, or the value of the parameter
current-compare.
> insert : elm col -> col
(insert x C) = {x} U C
> insert* : (list elm ...) col -> col
(insert xs c) = C U {x1, ...} , where xs = (list x1 ...)
> list->set : [cmp] (list elm) -> col
> list->bag : [cmp] (list elm) -> col
> list->heap : [cmp] (list elm) -> col
(list->set xs) = (insert* xs (empty))
(list->set cmp xs) = (insert* xs (empty cmp))
and similar for the others
> singleton : [cmp] elm -> col
(singleton x) = (insert x (empty))
(singleton cmp x) = (insert x (empty cmp))
> union : col col -> col
x in (union A B) <=> x in A or x in B
Futhermore shorthand for (insert* (list x1 ...) (empty)) is provided
in terms of:
> bag : (list elm) -> bag
> heap : (list elm) -> heap
> set : (list elm) -> set
DESTRUCTORS
-----------
> elements : col -> (list elm)
Returns a list of all occurrences of all elements in the
collection.
> find-min : col -> elm
The call (find-min C) returns an element x such that
x <= y for all y in C, where <= is determined by
C's compare function.
It is an error to use find-min on an empty collection.
> get : elm col -> elm
return an element from col equivalent to the given element, or #f
if no such element is found
> select : col -> elm
(select C) = x <=> x in C
Selects an element in the collection.
It is an error to pass an empty collecion to select.
DELETIONS
---------
> delete : elm col -> col
delete one occurence of the element from the collection
> delete-all : elm col -> col
delete all occurence of the element from the collection
> delete* : (list elm) col -> col
for each element in the list, delete one occurence of the
element from the collction
> delete-min : col -> col
delete one occurence of the minimal element from the collection
OBSERVERS
---------
> count : elm col -> natural-number
counts the number of occurences of the element in the collection
> empty? : col -> boolean
returns #t if the collection is empty,
otherwise #f is returned
> member? : elm col -> boolean
returns #t if the collection contains an occurrence of the element
> size : col -> natural-number
returns the number of elements in the collection
FOLDS
-----
> fold : (elm alpha -> alpha) alpha col -> alpha
Let the collection c contain the elements x1, x2, ..., xn, then
(fold kons knil c) = (kons xn ... (kons x2 (kons x1 knil)) ... )
TYPE PREDICATES
---------------
> set? : object -> boolean
> bag? : object -> boolean
> heap? : object -> boolean
Returns #t, if the object is a collection of the proper type.
EAGER COMPREHENSIONS
--------------------
Comprehensions
- - - - - - -
> (set-ec cmp <qualifier>* <expression>)
> (bag-ec cmp <qualifier>* <expression>)
> (heap-ec cmp <qualifier>* <expression>)
The collection of values obtained by evaluating <expression> once for
each binding in the sequence defined by the qualifiers. If there are
no qualifiers the result is the list with the value of
<expression>. The first argument cmp is the compare function used to
order the elements of the collection.
One has the following identity for set-ec (the actual implementation
is more efficent):
(set-ec cmp <qualifier>* <expression>)
= (list->set cmp (list-ec <qualifier>* <expression>))
Similar identities hold for bag-ec and heap-ec.
Generators
- - - - -
The generators :set, :bag, and :heap are defined.
Example
- - - -
> (elements (set-ec default-compare (:range i -5 5) (* i i)))
(0 1 4 9 16 25)
> (list-ec (: i (set 1 2 2 3)) i)
(1 2 3)
----------------------------------------------------------------------
HEAPS
----------------------------------------------------------------------
Heaps provide efficient access to the minimum element. Currently one
heap implementation is provided namely leftist heaps, which is
therefore also is the default heap implementation.
(require (planet "heap.scm" ("soegaard" "galore.plt")))
(require (planet "leftist-heap.scm" ("soegaard" "galore.plt")))
----------------------------------------------------------------------
SETS
----------------------------------------------------------------------
Besides the common collection operations sets also provide the
customary set operations. Most set implementations provide efficient
access to all elements.
There are two set implementations. One represents sets using red/black
trees, the uses sorted lists. The default set implementation is based
on red/black trees.
(require (planet "set.scm" ("soegaard" "galore.plt")))
(require (planet "red-black-tree-set.scm" ("soegaard" "galore.plt")))
(require (planet "list-set.scm" ("soegaard" "galore.plt")))
Set operations
- - - - - - - -
Besides the common operations sets also support the following
operations:
> difference : set set -> set
(difference A B) = {x in A | x not-in B}
> equal=? : set set -> boolean
The call (equal=? A B) returns true, if all elements of A are
members of B, and all elements of B are members of A.
> intersection : set set -> set
The call (intersection A B) returns the set of elements that are
members of both A and B. If x in A and y in B represents the same
element with respect to the compare function in question, it is
unspecified whether x or y is returned. (Use intersection/combiner
if you need a specific choice).
> set : elm ... -> set
(set x1 x2 ... xn) = (insert* (list x1 x2 ... xn) (empty))
> subset? : set set -> boolean
The call (subset? A B) returns #t if all elements of A are
members of B, otherwise #f is returned.
Sets support variations of the normal operations, that allows you to
control which element to keep in the case of duplicates. A combiner is
function of two arguments that receive two equivalent (with respect to
the compare function in question) and return the element that should
be kept.
The supported combiner operations are:
> insert/combiner (insert with combiner),
> insert*/combiner
> intersection/combiner
> list->set/combiner
> union/combiner
E.g. (insert*/combiner (list 1 2 3 4) (empty compare-mod2) max) will
result in the set {3, 4}. The combiner is called on the pairs 1 and 3,
and on 2 and 4.
> intersection/combiner : set set combiner -> set
Return the intersection of two sets. If x in A and y in B
represent the same element, then the element returned by the call
(c x y), where c is the combiner, is used as the representative.
----------------------------------------------------------------------
BAGS
----------------------------------------------------------------------
Bags (or multi-sets) keep a count of the number of occurences of each
element. Some bag implementations hold each individual element
inserted in the bag - the implementation provided by this library
keeps only one representative together with a count of how many times
it were inserted.
There are two bag implementations. One represents bags using red/black
trees, the uses sorted lists. The default bag implementation is based
on red/black trees.
(require (planet "bag.scm" ("soegaard" "galore.plt")))
(require (planet "red-black-tree-bag.scm" ("soegaard" "galore.plt")))
(require (planet "list-bag.scm" ("soegaard" "galore.plt")))
Bag Operations
- - - - - - - -
Besides the common operations bags also support the following
operations:
> bag : elm ... -> bag
(bag x1 x2 ... xn) = (insert* (list x1 x2 ... xn) (empty))
> difference : bag bag -> bag
(difference A B) = (fold delete A B)
> equal=? : bag bag -> boolean
The call (equal=? A B) returns true, if A is a subbag of B and B
is a subbag of A (see subbag?)
> fold/no : (elm number alpha -> alpha) alpha bag -> alpha
Like fold, but the combining function takes 3 arguments: an
element, the number of occurences of the element, and the
accumulator.
> intersection : bag bag -> bag
The call (intersection A B) returns a bag of elements that are
members of both A and B. The number of occurences of an element is
the least of the number of occurences in A and B. If x in A and y
in B represents the same element with respect to the compare
function in question, it is unspecified whether x or y is
returned. (Use intersection/combiner if you need a specific
choice).
> subbag? : bag bag -> boolean
The call (subbag? A B) returns #t if all elements of A are members
of B and the number of occurences of an element in A is less or
equal to the number of occurences of the element in B, otherwise #f
is returned.
Bags support the following combiner operations:
> insert/combiner
> insert*/combiner
> intersection/combiner
> list->set/combiner
> union/combiner
----------------------------------------------------------------------
TIME COMPLEXITIES
----------------------------------------------------------------------
The following table shows the worst case time complixity. The O( ) has
been omitted due to space considerations.
RB-Set RB-Bag Leftist-Heaps List-Set List-Bag
find-min 1 1 1 1 1
delete-min 1 1 1 1 1
get, member? log n log n n n n
insert log n log n log n n n
union n+m n+m log(n+m) n+m n+m
elements n n n 1 n
delete log n log n log n n n
delete-all log n log n log n n n
size n n n n n
bag, list->set n log(n) n log(n)
heap, list->bag n log(n) n log(n)
set, list->heap n
The operations: empty?, empty, singleton, and select are all O(1).
----------------------------------------------------------------------
EXAMPLES
----------------------------------------------------------------------
In the folder 'examples' you will find the following examples:
heap-sort.scm -- Heap example
queens.scm -- Heap example
knights-tour.scm -- Priority queue example
primes.scm -- Priority queue example
greedy-set-cover.scm -- Set example
matching-parenthesis.scm -- Stack Example
----------------------------------------------------------------------
DESIGN RATIONALE
----------------------------------------------------------------------
Argument order
--------------
The argument order was chosen to resemble the order used for list
operations.
(cons x xs) (insert x xs)
(car xs) (select xs)
(cdr xs) (delete-min xs)
(append xs ys) (insert* xs ys) or (union xs ys)
(delete x xs) (delete x xs) ; see srfi-1
The above choice makes it easy to use e.g. insert and delete in
folds.
----------------------------------------------------------------------
ACKNOWLEDGEMENTS
----------------------------------------------------------------------
- The red-black trees implementation started as a port of
Jean-Christophe Filliatre's Ocaml implementation.
- Guillaume Marceau
----------------------------------------------------------------------
REFERENCES
----------------------------------------------------------------------
[Ada] Adams, Stepehn, "Implementing Sets Efficiently in a Functional Language"
[Mar] Martinez, Conrado and Roura, Salvador: "Randomized Binary Search Trees"
[Oka] Chris Okasaki, "Purely Functional Data Structures"