Homework problem F

Problem F

Assigned Sept 25, due Oct 9.

This problem is CLR problem 9-2, part c.

Hint: Use select() to find the k-th smallest of the x's for various k's. Use binary search to home in on the k you want.


As possibly useful stuff on this problem and as example of how I would write up exercise solutions, below is my solution to parts a and b of problem 9-2. I haven't restated the problem. This will not make sense without reading 9-2 at the same time.

let A be a dataset containing the pairs elements xi and associated weights wiwhich sum to 1. let n be the number of elements.

In this discussion I will use 1-based indexing so i runs from 1 to n and also the rank of an xi runs from 1 (smallest) to n (largest).

Part (a):
If xj is the (lower) median of the x's, then k = ⌊((n-1)/2)⌋ of the x's are less than it, and K = ⌈((n-1)/2)⌉ are greater than it.

We must show that the sum of the k weights of the lesser x's is less than 1/2, and the sum of the K weights of the greater x's is no more than 1/2. Since all the weights are 1/n, the sum of any j weights is j/n.

If n is odd then k = K = n/2 - 1/2, so the sums are k/n = K/n = 1/2 - (1/2n). [corrected Sept 29 from "... 1/2 - (1/2)n."] Both are strictly less than 1/2, meeting the requirement.

If n is even then k+1 = K = n/2, the sum of the smaller is k/n = 1/2 - 1/n, which is is less than 1/2, and the sum of the larger is K/n = 1/2, which is no more than 1/2, as required.

Part (b):[with corrections of Sept 30]
Create a datatype consisting of (x, w) pairs. Such pairs can be compared according to their x field, but when moved around, the w value goes along with the x value. For instance as sketched in this C++ pseudocode.

template
//Let T be the type of the xi and R be the 
//type of the numeric weights, wi (such as double).

typedef pair<T, R> item;

//Define a less-than predicate function object on items:
struct less_item
{
	bool operator()(const item& a, const item& b)
	{ return a.first < b.first; }

	//...
}

typedef vector D;

T weightedMedian(D)
{   
// 1. Sort the dataset D according to the less_item compare function.
	sort(D.begin(),D.end(), less_item);
// 2. find where the weighted sum goes over 1/2.
	R sum = D[0].second;
	int i;
	for (i = 1; sum < 1/2; ++i) sum += D[i].second; 
	return D[i].first;
}
For any k, let sumk denote the value of sum after the k-th iteration of the for loop. We have that sumi-1 < 1/2 <= sumi, for the index i of the returned value. it follows from the second inequality that the sum of the larger elements is 1 - sumi <= 1/2. This demonstrates that the returned value is the weighted median. The cost of the algorithm is the cost of sorting, O(n*lg(n)), plus O(n) for step 2. Thus the overall cost is O(n*lg(n)). Remark: the loop terminates after no more than n iterations, because the sum of all n weights is 1 which is greater than the stopping condition.