Vikas Agrawal, Robert V. Canning, Jr., Qiong Li
[Brassard and Bratley, Algorithmics, Theory and Practice, Prentice Hall, 1988]
Given an array A of n elements, a value x is a majority element of A if more than half of the elements of A have the value x. Note that there can be at most one majority element in an array.
Give a linear time algorithm for determining whether or not an array has a majority element. Assume that the only comparisons allowed between values are tests of equality. That is, you may not assume that an order relation exists between elements.
In order to determine whether or not there is a majority element in linear time we must find a way of determining a candidate element for testing as majority in a fast way. The idea here is to take array A and reduce it to a smaller size (up to n/2). Take that new array reduce it and continue until we have one element left. This element is compared to the values in array A and a determination of majority is made. The technique here is based on the theory that if an element is a majority in A then it will be a majority in subsequent reductions of the array. The method of reducing the array is to pair each odd element with its next even number neighbor (i+1). The paired elements are compared and if they match one of them is copied to the subarray the other discarded. If the pair does not match both elements are discarded.
n - is the number of elements in array A.
m - is the number of time the majority element appears in A.
mp - is the number of majority pairs in A.
np - is the number of non-majority pairs in A.
lm - Mixed pair in A with majority element as one of elements.
lp - Mixed pair no majority element.
To prove that the above is true, we must prove that the number of majority pairs in the reduced array is greater than the number of non-majority pairs in the reduced array (that is, we need to show that mp > np). The proof here is assuming even number of elements in the array. If there is an odd number of elements we check that element for majority if it is not majority it is deleted.
We know that m is greater than n/2 in A. Based on this and the above 2 relations we can conclude:
thus mp > np.
Algorithm:
The algorithm is described as a C program and functions/procedures. I assumed no particular data type for A.
T(n) = O(1) + O(n) + O(n) + O(n) + O(n/2) + T(n/2) = T(n/2) + O(n) = O(n)
Using a stack we can determine whether or not there is a candidate for majority element in O(n) time. The stack starts empty. The first item in the list is pushed onto the stack. Then the next item is obtained from the list. This item is compared to the item in the stack. If the two items match then the item is pushed onto the stack. If there is not a match the stack is popped. This essentially destroys the item from the stack and the item we are comparing it against. Anytime the stack is empty, the next list item is pushed onto the stack. This is done one time through the list of elements. If there is a majority element it will be on the stack. If there is no majority element the stack will be empty. It is possible to have no majority element and the stack still have an item on it. Therefore it is necessary, when the stack is non empty to check the stack item against the list to determine whether this candidate is majority. The stack can be empty, no majority element, or can contain one or more of the same item. This is based on there being > n/2 elements to be majority and <= n/2 elements in minority which could cancel out the majority item on the stack.
The algorithm shown is a based on using a stack data structure, it is assumed that the functions are "well known".
Statement of the proof: Suppose there exists a value E in array A such that E is the majority element of A then we claim that E is left in the stack after running the algorithm.
Proof:
The cardinality of the majority value E in the array is, say m. The cardinality of the non-majority values in the array is then n-m, say k. Then m > k, from the statement of the problem. As stated in the methodology of the algorithm given above, we would require m non-majority values to cancel the m majority values. But in this case, k non-majority values only cancel k majority values. Thus there would still be (m-k) majority elements. Now (m-k) > 0 as defined by the problem. Therefore the stack would have at least 1 value of the majority element. This is verified by checking the topmost element of the stack with all elements of the array to find out the number of times it actually occurs. Thus the statement is proved.
Analysis of Time:
- Stack created in O(1) time.
- Comparisions done between stack and array A takes time O(n).
- Checking candidate element for majority takes O(n) time.
O(1) + O(n) + O(n) = O(n) time.