Notes for week 8, Treaps

Treap is a data structure implementing the SSet interface. find(x), add(x), remove(x), first(), last(), prev(), next().

Reminders:

BSTs (binary search trees) support this interface, but the cost per operation is O(n) in the worst case, O(log(n)) only in the average case.
'find' has two forms, findEQ(x) and find(x).
1. findEQ(x) is the same as in USet (hash tables). It returns the item 'null' if no item equal to x is found.
2. find(x) is the SSet operation that returns the smallest item y such that x ≤ y.
  1. y = x, if there is an equal item in the data,
  2. y is the next larger item than x in the data, if no item is equal to x, but
  3. y is 'null' if x is the largest item in the data.

The name is Treap derived from TREe-heAP. It is simultaneously a binary search tree and a heap,
a random BST of user items (SSet interface) and
a heap (Priority Queue interface) of random values.

Note that tree structure is random binary tree, not left complete tree, so while it has the heap property with regard to the random values, it is NOT an array-stored left complete binary tree (as in the binary heap implementation of priority queue).

class Node {
    T item;  // ODS text calls this x.
    int p; // priority
    Node* parent;
    Node* left;
    Node* right;
    // ... cstors
}

Node notation for pictures: "(item, p)"

Properties of Treaps

A treap is a binary search tree with respect to the user items and it is a heap with respect to the randomly assigned priorities.
Therefore, the structure of the tree is the same as it would be if the items had been inserted in order as indicated by the priority.
But the priority is randomly assigned. Therefore the tree is a random binary search tree.
A random BST has expected height of O(log(n)) (more precisely about 2*ln(n)).
Thus the expected cost of each of the BST operations (find(x), add(x), remove(x), first(), last(), prev(x), next(x) is O(log(n))

bool add(T x) {
  if x is in the treap, return false,
  otherwise 
    1. make Node* u = new Node(x, rand())
    2. insert u in the treap so as to 
       a) preserve the heap property with respect to priority, and 
       b) the binary search tree property with respect to x.
}

Refine step 2:

    2. a) add u in the BST way (preserves BST property)
       b) bubble up to restore heap property 
          (but it's not your classic bubbling)

Example:

Treap<char> S;
Step 1. S.add(k);  // add Node (k,7) [where 7 is random int]

    (k,7)

Step 2. S.add(c);  // add Node (c,15) [where 15 is random int]

    (k,7)
     /
  (c,15)

Step 3. S.add(h);  // add Node (h,9) 

Heap bubble step:  
    (k,7)   →     (k,7)
     /             /       
  (c,15)        (h,9)     
     \              \    
    (h,9)         (c,15)
XXXXXX try again...


    (k,7)   →     (k,7)
     /             /
  (c,15)        (h,9 )
     \            / 
    (h,9)     (c,15)

We've just done a rotation (rotate left).

Step 4. S.add(t);  // add Node (t,8) 

    (k,7)   →      (k,7)
     /              / \
  (h,9)         (h,9)  (t,8)
   /              /
 (c,15)       (c,15)

Step 5. S.add(d);  // add Node (d,2) 

    (k,7)     →     (k,7)    →     (k,7)      →      (d,2)
     /  \            / \            / \               / \
  (h,9) (t,8)   (h,9 )  (t,8)    (d,2) (t,8)      (c,15) (k,7)
    /             /               / \                     / \
 (c,15)       (d,2)           (c,15) (h,9)            (h,9) (t,8) 
    \          /
   (d,2)     (c,15)

Note: Same tree results if we add in this order:

(d,2), (k,7), (t,8), (h,9), (c, 15)

(d,2) → (d,2) → (d,2)   →   (d,2)     →    (d,2)
          \        \           \            / \
          (k,7)   (k,7)       (k,7)    (c,15)  (k,7)
                      \        /  \             /  \
                     (t,8)  (h,9) (t,8)      (h,9) (t,8)

...with no bubbling up happening. Look at rotations in ODS 7.2 (u/w swap in picture would be good).

Removing an item in a treap

T remove(T x) {
  if a y equal to x is not in the treap, return null,
  otherwise 
    1. find the node containing item equal to x.
    2. trickle u down until it is a leaf.
       a) fix the heap property with respect to priority, and 
       b) maintain the binary search tree property with respect to x.
    3  snip it off.
}

Refine step 2:

    2. trickle down:   
        if u is a leaf return
        if u has no left child or right child priority is less than left child's, 
             rotate left and trickle on down to the left.
        if u has no right child or left child priority is less than right child's, 
             rotate right and trickle on down to the right.

Example:

S.remove(k);  

    (d,2)     →     (d,2)    →     (d,2)      →      (d,2)
     /  \            /  \            / \               / \
  (c,15) (k,7)   (c,15) (t,8)    (c,15) (t,8)      (c,15) (t,8)
          / \           /                /                 / 
      (h,9) (t,8)    (k,7)            (h,9)             (h,9)
                     /                   \
                  (h,9)                 (k,7)

Conclusion: Treaps are a good implementation of the SSet interface.
Operations find(x), add(x), remove(x) run in expected time O(log(n)).

"Computer science is not really about computers -- and it's not about computers in the same sense that physics is not really about particle accelerators, and biology is not about microscopes and Petri dishes...and geometry isn't really about using surveying instruments. Now the reason that we think computer science is about computers is pretty much the same reason that the Egyptians thought geometry was about surveying instruments: when some field is just getting started and you don't really understand it very well, it's very easy to confuse the essence of what you're doing with the tools that you use."
Hal Abelson (1986)

Didn't the phone company need SSet operations before the age of computers?
lots of find(x) -- look-up-account is part of handling every call.
some add(x) -- new phone service.
some remove(x) -- cancelled phone service (moved, bankrupt, ...)
monthly or annual traversal (billing cycle, print phone book)
How did they do it prior to computer age?

Doesn't the Internal revenue need SSet operations?
How did they do it prior to computer age?
Discuss IRS recordkeeping with a view to their SSet issues.
What scale? Frame the questions. Estimate the answers.
What categories of record keeping users do they have?
What are the needs in various categories (again, frame questions, estimate answers).
Design a recordkeeping system that they could have used prior to computer age. Consider how the SSet operations might have been done.
What might force their recordkeeping to change in the future?

clicker questions.