Notes for week 6 first lecture.  hash table organization
----[ Mon, March 17 ]----
hash tables:  Separate Chaining, Linear Probing, basic organization of hash functions.
- 
The USet interface is supported.
With this setup,
 
typedef <Some record type on which == and hashCode are defined> T;
T null = ...;
T del = ...; // used in linear probing
ChainedHashTable<T> H(null);     // array of lists of T (various
    or                           // lengths of lists)
LinearHashTable<T> H(null, del); // array of T (with gaps among
                                 // the items)
T x; 
one can call the USet member functions thusly:
T y = H.find(x);   // y is the found record such that 
                   // y == x or y is null.
bool b = H.add(x); // x is added or b is false.
T y = H.remove(x); // y is the removed record such that 
                   // y == x or nothing is removed and y is null.
int n = H.size(); // return current size.
- 
Let w = 32 be the bit length of a computer word (w = 64 on some machines).
 Let W = 2w.
 
 
- 
Let D be the size of the hash table.
In hashFromInt() we apply a 
formula for going from a hashCode, which is an unsigned int in the range 0..W-1, to a hash table index in the range 0..D-1.  
For the sake of speed of hashFromInt(), we require that the hash table length is 
is a power of 2,  D = 2d for some d < w.
 For example D = 1024 (d = 10), or D = 32768 (d = 15).
 
 
- 
Chained hash table also maintains the invariant 2d/4 ≤ n ≤ 2d.
 
 
- 
Linear hash table also maintains the invariant 2d/8 ≤ n ≤ 2d/2.
 
 
-  
unsigned int hash(T x) { return hashFromInt(hashCode(x)); }
 
 
- 
unsigned int hashFromInt(unsigned int k)
 is part of the hash table implementation.
 
 
- 
unsigned int hashCode(T x)
 is part provided by the user of the hash table, who also decides what type T actually is.
- 
Separate Chaining, ODS 5.1, is one way to deal with hash collisions.
Implementation is in ChainedHashTable.h
- 
Theorem (see 5..2 - however I do NOT expect you to know the proof of 5..2).
If unsigned ints hashed are uniformly random, and n < array length is maintained by resizing when necessary, then the expected chain length (length of a list in the array of lists) is at most 1.
- 
Linear probing is another way to deal with hash collisions.
Implementation is in LinearHashTable.h
-  storage is an array of T.  Contrast: in separate chaining it is an array of list of T.
-  Two special values, nullanddel, are used.
-  q is the number of non-nulls.
-  n is the number of user data entries (non-null and non-del).
-  t is the array of items of type T.
 
-  T find(T x) { return t[findIndex_x_or_null(x)]; }
-  A (private) helper function:
 int findIndex_x_or_null(T x) {
-    int i = hash(x);
-    search at positions j = i, i+1, i+2, ... for an entry equal to x.
 stop if you find t[j] equal to x or find t[j] is null.
 (thus continue if t[j] is del or is any other T value.)
-    return j; 
 }
-  bool add(T x) { 
-    int j = t[findIndex_x_or_null(x)]; }
-    if ( t[j] == x ) return false;
-    else {
-      int j = t[findIndex_x_or_null(del)]; }
-      if ( t[j] == null ) update n and q one way.
-      if ( t[j] == del ) update n and q another way.
-      Decide if you should resize based on n, q, and length.
-      t[j] = x.
-      return true;
 }
 }
-  T remove(T x) { 
-    int j = t[findIndex_x_or_null(x)]; }
-    T y = t[j]; // return value.
-    t[j] = del;
-    Update n and q.  Decide if you should resize.
-    return y;
 }
-  Here is a mnemonic for remembering how del is used: del is "owl awe" for "occupied when looking (find), available when entering (add).
Theorem (see 5..2 - however I do NOT expect you to know the proof of 5..2).
If unsigned ints hashed are uniformly random, and n < half of array length is maintained by resizing when necessary, then the expected time of each find, add, remove operation is O(1).
Insert letters of "homecoming" in a hash table of size 16
then do some find's, remove's, add's.
Hash values: h-1 o-1 m-4 e-5 c-0 i-11 n-0 g-13 b-2 z-1
'+' = add, '-' = remove, '?' = find, '.' = null, '*' = del. 
loc:0 1 2 3 4 5 6 7 8 9 a b c d e f  return #probes
---------------------------------------------------
+h: . h . . . . . . . . . . . . . .    t       1
+o: . h o . . . . . . . . . . . . .    t       2
+m: . h o . m . . . . . . . . . . .    t       1
+e: . h o . m e . . . . . . . . . .    t       1
+c: c h o . m e . . . . . . . . . .    t       1
+o: c h o . m e . . . . . . . . . .    f       1
+m: c h o . m e . . . . . . . . . .    f       2
+i: c h o . m e . . . . . i . . . .    t       1
+n: c h o n m e . . . . . i . . . .    t       4
+g: c h o n m e . . . . . i . g . .    t       1
-o: c h * n m e . . . . . i . g . .     o      2
?n: c h * n m e . . . . . i . g . .     n      4
?b: c h * n m e . . . . . i . g . .     .      5
?z: c h * n m e . . . . . i . g . .     .      6
+b: c h b n m e . . . . . i . g . .    t       6
-g: c h b n m e . . . . . i . * . .     g      1
clickers