Notes for week 6 first lecture. hash table organization
----[ Mon, March 17 ]----
hash tables: Separate Chaining, Linear Probing, basic organization of hash functions.
-
The USet interface is supported.
With this setup,
typedef <Some record type on which == and hashCode are defined> T;
T null = ...;
T del = ...; // used in linear probing
ChainedHashTable<T> H(null); // array of lists of T (various
or // lengths of lists)
LinearHashTable<T> H(null, del); // array of T (with gaps among
// the items)
T x;
one can call the USet member functions thusly:
T y = H.find(x); // y is the found record such that
// y == x or y is null.
bool b = H.add(x); // x is added or b is false.
T y = H.remove(x); // y is the removed record such that
// y == x or nothing is removed and y is null.
int n = H.size(); // return current size.
-
Let w = 32 be the bit length of a computer word (w = 64 on some machines).
Let W = 2w.
-
Let D be the size of the hash table.
In hashFromInt() we apply a
formula for going from a hashCode, which is an unsigned int in the range 0..W-1, to a hash table index in the range 0..D-1.
For the sake of speed of hashFromInt(), we require that the hash table length is
is a power of 2, D = 2d for some d < w.
For example D = 1024 (d = 10), or D = 32768 (d = 15).
-
Chained hash table also maintains the invariant 2d/4 ≤ n ≤ 2d.
-
Linear hash table also maintains the invariant 2d/8 ≤ n ≤ 2d/2.
-
unsigned int hash(T x) { return hashFromInt(hashCode(x)); }
-
unsigned int hashFromInt(unsigned int k)
is part of the hash table implementation.
-
unsigned int hashCode(T x)
is part provided by the user of the hash table, who also decides what type T actually is.
-
Separate Chaining, ODS 5.1, is one way to deal with hash collisions.
Implementation is in ChainedHashTable.h
-
Theorem (see 5..2 - however I do NOT expect you to know the proof of 5..2).
If unsigned ints hashed are uniformly random, and n < array length is maintained by resizing when necessary, then the expected chain length (length of a list in the array of lists) is at most 1.
-
Linear probing is another way to deal with hash collisions.
Implementation is in LinearHashTable.h
- storage is an array of T. Contrast: in separate chaining it is an array of list of T.
- Two special values,
null
and del
, are used.
- q is the number of non-nulls.
- n is the number of user data entries (non-null and non-del).
- t is the array of items of type T.
- T find(T x) { return t[findIndex_x_or_null(x)]; }
- A (private) helper function:
int findIndex_x_or_null(T x) {
- int i = hash(x);
- search at positions j = i, i+1, i+2, ... for an entry equal to x.
stop if you find t[j] equal to x or find t[j] is null.
(thus continue if t[j] is del or is any other T value.)
- return j;
}
- bool add(T x) {
- int j = t[findIndex_x_or_null(x)]; }
- if ( t[j] == x ) return false;
- else {
- int j = t[findIndex_x_or_null(del)]; }
- if ( t[j] == null ) update n and q one way.
- if ( t[j] == del ) update n and q another way.
- Decide if you should resize based on n, q, and length.
- t[j] = x.
- return true;
}
}
- T remove(T x) {
- int j = t[findIndex_x_or_null(x)]; }
- T y = t[j]; // return value.
- t[j] = del;
- Update n and q. Decide if you should resize.
- return y;
}
- Here is a mnemonic for remembering how del is used: del is "owl awe" for "occupied when looking (find), available when entering (add).
Theorem (see 5..2 - however I do NOT expect you to know the proof of 5..2).
If unsigned ints hashed are uniformly random, and n < half of array length is maintained by resizing when necessary, then the expected time of each find, add, remove operation is O(1).
Insert letters of "homecoming" in a hash table of size 16
then do some find's, remove's, add's.
Hash values: h-1 o-1 m-4 e-5 c-0 i-11 n-0 g-13 b-2 z-1
'+' = add, '-' = remove, '?' = find, '.' = null, '*' = del.
loc:0 1 2 3 4 5 6 7 8 9 a b c d e f return #probes
---------------------------------------------------
+h: . h . . . . . . . . . . . . . . t 1
+o: . h o . . . . . . . . . . . . . t 2
+m: . h o . m . . . . . . . . . . . t 1
+e: . h o . m e . . . . . . . . . . t 1
+c: c h o . m e . . . . . . . . . . t 1
+o: c h o . m e . . . . . . . . . . f 1
+m: c h o . m e . . . . . . . . . . f 2
+i: c h o . m e . . . . . i . . . . t 1
+n: c h o n m e . . . . . i . . . . t 4
+g: c h o n m e . . . . . i . g . . t 1
-o: c h * n m e . . . . . i . g . . o 2
?n: c h * n m e . . . . . i . g . . n 4
?b: c h * n m e . . . . . i . g . . . 5
?z: c h * n m e . . . . . i . g . . . 6
+b: c h b n m e . . . . . i . g . . t 6
-g: c h b n m e . . . . . i . * . . g 1
clickers