1) i) Sketch an algorithm for checkpointing/restarting a condor job through a signal handler. ii) Give a brief idea of what things (different from the above algo) should be considered if we should checkpoint/restart a parallel job (starting from the checkpoint signal sent to the remote job(s) to the restart(s)) Reading : Condor Technical Summary.Section 6 on checkpointing. www.cs.wisc.edu/condor/doc/tech.ps Help : For crt0 - http://www.calpoly.edu/cgi-bin/man-cgi?crt0+3 2) Consider the Job and Resource ClassAds given below. i) Do Matchmaking and choose the best resource for the job, if any. Give Reasons. ii) Change one of the resource classAds so that we get a match now. (if you didnt get a match previously...) Reading : Matchmaking: Distributed Resource Management for High Throughput Computing. Section 3.2 on Matchmaking & Claiming www.cs.wisc.edu/condor/doc/hpdc98.ps Attribute References : http://www-f9.ijs.si/~matevz/docs/condor-V6_4-Manual/2_5Submitting_Job.html#1382 Job1 [ Type = “Job”; Executable = image_sim Owner = “raju”; Requirements = (other.OpSys == “Linux” && other.DiskSpace > 140M && other.KFlops > 7000); Rank = (other.DiskSpace > 300M ? 10 : 1); ] Machine1 [ Type = “Machine”; OpSys = “Linux”; DiskSpace = 500M; AllowedUsers = {“raju”, “potter”, “granger”}; Requirements = (IsMember(other.Owner, AllowedUsers); ] Machine2 [ Type = “Machine”; OpSys = “Linux”; KFlops = 8000; ResearchGroup = {“malfoy”, “weasley”, "raju"); Requirements = (IsMember(other.Owner, ResearchGroup); ] 3) Some Light reading. If you are member of a research group that does heavy mathematical computations, then you should read this article http://www.globus.org/about/news/nug30.html This article gives a glimpse of how condor can help you!! (Just a note that around 50 machines in Smith basement are rarely used from 10 pm to 6 am everyday....) Happy Spring Break...