1) 
i) Sketch an algorithm for checkpointing/restarting a condor job through a signal handler.
ii) Give a brief idea of what things (different from the above algo) should be considered if we should checkpoint/restart a parallel job
(starting from the checkpoint signal sent to the remote job(s) to the restart(s))

Reading : Condor Technical Summary.Section 6 on checkpointing. www.cs.wisc.edu/condor/doc/tech.ps
Help : For crt0 - http://www.calpoly.edu/cgi-bin/man-cgi?crt0+3

2) Consider the Job and Resource ClassAds given below.
i)  Do Matchmaking and choose the best resource for the job, if any. Give Reasons.
ii) Change one of the resource classAds so that we get a match now. (if you didnt get a match previously...)

Reading : Matchmaking: Distributed Resource Management for High Throughput Computing. Section 3.2 on Matchmaking & Claiming
www.cs.wisc.edu/condor/doc/hpdc98.ps
Attribute References : http://www-f9.ijs.si/~matevz/docs/condor-V6_4-Manual/2_5Submitting_Job.html#1382

Job1

[
	Type = “Job”;
	Executable = image_sim
	Owner = “raju”;
	Requirements = (other.OpSys == “Linux” && other.DiskSpace > 140M && other.KFlops > 7000);
	Rank = (other.DiskSpace > 300M ? 10 : 1);
]

Machine1

[
	Type = “Machine”;
	OpSys = “Linux”;
	DiskSpace = 500M;
	AllowedUsers = {“raju”, “potter”, “granger”};
	Requirements = (IsMember(other.Owner, AllowedUsers);
]

Machine2

[

	Type = “Machine”;
	OpSys = “Linux”;
	KFlops = 8000;
	ResearchGroup = {“malfoy”, “weasley”, "raju");
	Requirements = (IsMember(other.Owner, ResearchGroup);

]

3) Some Light reading.
If you are member of a research group that does heavy mathematical computations, then you should read this article 
http://www.globus.org/about/news/nug30.html
This article gives a glimpse of how condor can help you!! 
(Just a note that around 50 machines in Smith basement are rarely used from 10 pm to 6 am everyday....)

Happy Spring Break...