Java on Chimera

I created this page for Java users to discuss what they have figured out about using Java on Chimera. Most will want to run multi-threaded Java programs on 1 node, taking advantage of the 48-core shared-memory parallelism. There are also Java extensions for inter-node communication (MPJ-Express) that are like MPI, only better ;-)

Launching a Multi-threaded Java Program

Here is an example program:

 * Compute \sum{i=0..bound-1}sin(i).  Divide up the work among threads.

public class ThreadDemo {
    public static long bound =   50000000;
    public static  int numThreads;

    /** Command-line argument is the number of threads to use. */
    public static void main(String args[]) throws Exception {
        if (args.length != 1) {
            System.out.println("Usage: ThreadDemo numThreads");
        } else {
            numThreads = new Integer(args[0]);

        Thread[] threads = new Thread[numThreads];
        double startTime = System.currentTimeMillis(), endTime, time;

        System.out.println("Main thread starting.");
        for (int i=0; i < numThreads; i++) {
            threads[i] = new Worker(i,i*bound/numThreads, (i+1)*bound/numThreads);
        System.out.println("All worker threads launched.  Waiting for them to complete...");
        for (int i=0; i<numThreads; i++) {
        endTime = System.currentTimeMillis();
        time = (endTime-startTime)/1000.0;
        System.out.println("All worker threads terminated: main thread exiting.");
        System.out.println("Total time (s): "+time);

class Worker extends Thread {
    private int pid;
    private long first;
    private long lastPlusOne;

    Worker(int pid, long first, long lastPlusOne) { = pid;
        this.first = first;
        this.lastPlusOne = lastPlusOne;

    public void run() {
        System.out.println("Worker thread "+pid+" has started summing from "+
                           first+" to "+(lastPlusOne-1));

        double result = 0.0;

        for (long i = first; i<lastPlusOne; i++) {
            result += Math.sin((double)i);
        System.out.println("Worker thread "+pid+" exiting.");

I put the code in /usa/siegel/java and compiled it there.

In my home directory, I put this script, called run1, which I made executable:

java -classpath /usa/siegel/java ThreadDemo 1

And then I put this script, called __run5__, also executable:

java -classpath /usa/siegel/java ThreadDemo 5

(I know, not the most elegant, thanks for pointing that out.)

I then wrote this script to launch some tests in batch mode:

sbatch -p 64g -N 1 --cpus-per-task=5 --exclusive run5
sbatch -p 64g -N 1 -n 1 --cpus-per-task=1 --exclusive run1

I don't think the -n 1 or the __\-\-cpus-per-task__ do anything, as I'll explain below. I was trying to control how many cores (CPUs) were allocated to the JVM, but that doesn't seem to control it. The JVM seems to be able to access as many cores as it wants in either case. However, I am able to control how many threads are spawned, (5 or 1), by my command-line argument.

Here is the result of running the above experiment. The first batch command yields this output:

Main thread starting.
Worker thread 0 has started summing from 0 to 9999999
Worker thread 4 has started summing from 40000000 to 49999999
All worker threads launched.  Waiting for them to complete...
Worker thread 3 has started summing from 30000000 to 39999999
Worker thread 2 has started summing from 20000000 to 29999999
Worker thread 1 has started summing from 10000000 to 19999999
Worker thread 0 exiting.
Worker thread 1 exiting.
Worker thread 3 exiting.
Worker thread 2 exiting.
Worker thread 4 exiting.
All worker threads terminated: main thread exiting.
Total time (s): 15.373

And the second batch command this:

Main thread starting.
All worker threads launched.  Waiting for them to complete...
Worker thread 0 has started summing from 0 to 49999999
Worker thread 0 exiting.
All worker threads terminated: main thread exiting.
Total time (s): 74.341

So, almost exactly the 5x speedup expected.

Question: how can we actually limit (put an upper bound on) the number of cores the JVM can use? If I remove the SLURM arguments for -n and/or --cpus-per-task, the results are unchanged. So that does not seem to control it.