Getting acroread to work on OpenSolaris

August 10th, 2010

Acroread doesn’t work by default on OpenSolaris.  I got it to work on the b134 preview by giving it the privilege that it requires.

By default acroread (version 9.x on x86) will do the following.

terminate called after throwing an instance of 'RSException'

That doesn’t tell us much.  There are similar error messages reported for Linux also if you do a search.  To find the actual problem use the ppriv command like so,

% ppriv -e -D acroread
acroread[6810]: missing privilege "proc_priocntl" (euid = XXXX, syscall = 112) needed at secpolicy_setpriority+0x24
terminate called after throwing an instance of 'RSException'

So that tells us that it needs the proc_priocntl permission.  Now there are some pages that explain how to add that privilege for a user so they can run acroread.  That doesn’t fit my security model and has other problems.  So instead I gave the acroread command the privilege so any user can run it.  I couldn’t find this documented anywhere which is why I’m writing this now.

I modeled my solution after the way cdrecord is given permission to work.  I added the following line to /etc/security/exec_attr

Basic Solaris User:solaris:cmd:::/usr/local/bin/acroread.bin:privs=proc_priocntl

The acroread.bin file is actually a script that runs acroread.  We are giving it the proc_priocntl privilege.  /usr/local/bin/acroread is just a script that looks like the following:

#!/bin/sh
exec pfexec /usr/local/bin/acroread.bin "$@"

So that’s how to give acroread permission to work correctly so you don’t need to give users a privilege that they don’t need.

OpenSolaris roadmap1 (not)

June 14th, 2010

I was surprised to see the other day a blog titled “Open Solaris Roadmap”,
http://milek.blogspot.com/2010/06/open-solaris-roadmap.html

I was even more surprised to see that the source of the information was a link to my web page (for the DE-OSUG) to the pdf that Harry Foxwell presented to us (the Delaware OSUG) back in March!

Then a highly read blogger wrote about that blog entry, http://www.cuddletech.com/blog/pivot/entry.php?id=1132

Which led to a thread on the opensolaris discuss mailing list about an updated roadmap, http://opensolaris.org/jive/thread.jspa?threadID=130628&tstart=0

So to sum things up, there is no news on when the next OpenSolaris (2010.x) will be released.

Update:  an article by the Register http://www.theregister.co.uk/2010/06/21/oracle_opensolaris_solaris_plans/ (nothing new there).

Windows connector and Grub-like menu for Sun Rays

April 5th, 2010

I recently investigated setting up Sun Rays using kiosk mode using the Windows Sun Ray connector.  Using the connector from Solaris is pretty easy as the client program utssc is basically an RDP client.  Actually, rdesktop could also be used from Solaris in the same way.  However, I had to set up kiosk mode so the Sun Ray would go directly to a Windows server.  I followed the docs on setting up kiosk mode (utconfig -k, it kept failing until I ran it under truss then it succeeded…).  Most of the docs talk about using a GUI, which I do not use and setting up kiosk for smart cards or non smart card users, etc.  What I need to do was set up one specific Sun Ray to use kiosk mode, which was a little harder to find.  To do that I ran the following:

/opt/SUNWut/sbin/utkioskoverride -s kiosk -r pseudo.TOKEN -c uttsc

Where TOKEN is the mac address for the specific Sun Ray.  I also had to run ‘utkiosk -i uttsc -f uttsc.conf’  In uttsc.conf I had to add KIOSK_SESSION=uttsc and of course change KIOSK_SESSION_ARGS to set the Windows server to connect to.  This took me a little while to get used to, but it worked well.  I found this web site useful.

Then I was tasked to come up with a grub-like menu so users could select Solaris (or Linux) or Windows.  This was a bit more work.  Most of the solutions seem to talk about using the GUI and having the Sun Ray admin put things in to change what the Sun Ray does for different tokens.  I ended up taking ideas from two implementations to come up with a solution that provides a grub-like menu for Sun Rays.

One site that is recommended is http://blogs.sun.com/danielc/entry/meta_kiosk_how_to_run on “Meta Kiosk”  which I looked at and it gave me the idea of using Xephyr.  I also found http://blogs.sun.com/mplona/entry/customized_sun_ray_kiosk_sessions which gave me the idea of using zenity and gave some code to follow (I looked at menu2).  Neither solution did what I wanted so I wrote my own kiosk session script.  My proof-of-concept so far is much smaller than those solutions, but seems to work for me.  Here is my script:

#!/bin/sh
# install this as /etc/opt/SUNWkio/sessions/menu/menu

# Add a line to set the background image, then display the menu
choice=`/usr/bin/zenity  --width=200 --height=200 --list --column  "Select  Operating System" "Solaris" "Windows"` 

case "${choice}" in
    "Solaris")
     # This part was inspired by meta-kiosk-session's run-X11-session
     # set servername of course
     display=0
     while [ $display -ne 256 ]
     do
       display=`expr $display + 1`
       /usr/X11/bin/Xephyr :${display} -once -query ${servername} -fullscreen
       if [ $? -ne 1 ]
       then
         exit
       fi
     done
     ;;
    "Windows")
    # Need KIOSK_SESSION_ARGS set correctly for server to connect to
     . /etc/opt/SUNWkio/sessions/uttsc/uttsc
     exit
    ;;
 esac

This can be used to connect to any Solaris or Linux system that would accept the X11 connection.  So instead of listing Operating systems maybe you would want to list several hosts for example.

I called this the menu config and then changed the Sun Ray from using uttsc to menu with  utkioskoverride.  The script goes in  /etc/opt/SUNWkio/sessions/menu/ and the following goes in /etc/opt/SUNWkio/sessions/menu.conf

KIOSK_SESSION=menu
KIOSK_SESSION_LIMIT_VMSIZE=2000000
KIOSK_SESSION_ARGS=-t 1800 -- -r sound:low SERVERNAME
KIOSK_SESSION_EXEC=$KIOSK_SESSION_DIR/menu

I also loaded that file with ‘utkiosk -i menu -f menu.conf’.  And it works.  It presents a little menu where you can select and then what is put in the case statement runs.  I then specified some other Sun Rays to run in kiosk mode using this session config and they are being tested.  One bug found so far is that sound doesn’t work.  Also, USB devices do not really work using this approach.  For example, plugging in a USB key drive the drive gets mounted by the random untrusted kiosk user and the user that logs on through Xephyr doesn’t have permission to it (maybe a work around exists for that?).  But if those things are not important and you and have the need for a grub-like menu to select the OS on a Sun Ray it seems to do the job.

I tested this on a Sun Ray server runing SXCE b111 with version 4.1 of the SunRay server software and version 2.1 of the windows connector (SRWC).

Small test on thor

March 19th, 2010

I just did a small speed test with gtar on an x4540 (thor) system that will become our replication server.  I wanted to see how much adding SSDs for the zil would help.  The system has 500GB disks arranged in four raidz2 vdevs of 10 disks each.  For my test I unpacked mysql-5.1.36.tar.gz twice in each configuration.

My first test was to do the test locally on the system to get a baseline.  I used ptime to time how long ‘gtar xzf’ on mysql-5.1.36.tar.gz took.  Locally with no ssd the times were 1.62 seconds and 1.56 seconds.  I then added two ssds as mirrored slogs and the local times were 1.49 and 1.50.  So I have baseline performance numbers to compare against and I then removed the log devices.  This test is using OpenSolaris b134, btw.

I then mounted the filesystem on the test x4600 with NFS over a 10 gigabit ethernet link.  The times to unpack the file with no ssd configured were 7:05.86 and 6:53.76 (around 7 minutes).  I then configured the mirrored slogs again and the times went down to 2:26.73 and 2:28.62, so under 2.5 minutes.  That’s a pretty good improvement.    I just did the same test again after adding part of the SSDs as L2ARC read caches.  I didn’t expect much change and there wasn’t as the times were 2:19.22 and 2:24.29.

The SSDs used were Crucial 256GB RealSSD C300.  They are MLC so they should be good for L2ARC, but they also improved the synchronous writes quite a bit.  I’d like to test a couple of SLC SSDs like the Intel X25-E which are supposed to be better for slogs.  I plan to use dedup on this system so the L2ARC should help.  If this system ends up serving out iSCSI the log devices for zil also should help.

Xen (xVM) on OpenSolaris

March 19th, 2010

This post has more to do with using Jumbo frames with Xen (xVM) than with OpenSolaris actually I think.  I was getting a bit frustrated this morning trying to get a xen domain working on the x4600 test OpenSolaris server.  CentOS could not use the ethernet interface as it should.  I noticed this syslog message,

WARNING: xnbo_open_mac: mac device SDU too big (9000)

That led me to the bug report, http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6803634, which basically says xen doesn’t work with jumbo frames.  Apparently the bug fixed in that report is that a kernel panic no longer happens.  The x4600 has a myri10ge interface with jumbo frames enabled so I changed to using the onboard e1000g0 interface (changed it to use an mtu of 1500 with ‘dladm set-linkprop -p mtu=1500 e1000g0′) and xen domUs started to work.  This is a newer version of xen (3.4.2) also and I found I had to change the interface type in the xml file from ethernet to bridge also for it to work.

I then rebooted the x4600 with the myri10ge0 mtu set to 1500 and a quick test with netperf showed that throughput was cut in about half.  With jumbo frames I was seeing about 3.5-4.0Gbps between the x4600 and an x4540 also running OpenSolaris b134 with a myri10ge.  With jumbo frames turned off performance went down to 1.6-2.0Gbps, but the CentOS xen domain runs on the myri10ge interface now…

So it looks like our choices are to have OpenSolaris zones use the myri10ge interface with jumbo frames and the Linux and Windows xen domains use the e1000g interfaces (probably aggregate two of them) without jumbo frames or have all use the myri10ge interface without jumbo frames…

I setup the myri10ge to use jumbo frames again and setup a CentOS domU to use e1000g0 with mtu of 1500 and rebooted.  The domU came up fine, well I had to set the time correctly on both the bare-metal and domU because of the ntp problem with the xen hypervisor loaded.

Here are some notes on the virt-install command to install a paravirtual Linux domU.

virt-install --name xvm2 --disk path=/xen/xvm2/xvm.img,size=10,driver=file
 --paravirt --os-type=linux --network bridge=e1000g0 mac=xx:xx:xx:xx:xx:xx
 --nographics --ram 1024 --os-variant=rhel5
 --location [iso or http://location]

Then for CentOS anyway select an http install, etc…

More OpenSolaris experience

March 18th, 2010

Turns out the workaround I did for the ntpd problem in dom0 when running the xVM hypervisor doesn’t work consistently (or since we when to DST it is acting differently).  There is a combination of using ‘rtc -z US/Eastern’ and ‘rtc -z UTC’ where I can get ntpdate to set the time correctly and have ntpd keep it correctly.  However, it seems non-deterministic if it will work on reboot or not.  So now that we are in DST the clock may jump ahead 4 or 8 hours on reboot and need to be fixed.  The open bug for this is http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6908973.

Also, there is another thing I changed from the defaults on OpenSolaris.  Network automagic (nwam) is cool for a laptop where you may go back and forth between wired and wireless (I’ve used it some for my laptop), but I don’t see why that is the default and it’s recommended to keep it turned on.  For servers of course I want to use static IPs rather than dhcp.  NWAM does have a way to set static IPs (through /etc/nwam/llp), but I much prefer the old way of using /etc/hostname.INTERFACE files.  So I disabled the network/physical:nwam service and enabled network/physical:default to get the previous behavior.  I also have to put files first for hosts in /etc/nsswitch.conf for an nfs mount in /etc/vfstab to work which I didn’t need to before.

Grub strangeness on CentOS

March 18th, 2010

I installed CentOS 5.4 on a Dell PowerEdge 6850 yesterday.  It was running SXCE b111, but it needed to run Matlab.  It would have been much easier to bring up a virtual machine, but it was preferred by the user to have Linux run on the bare-metal.  So I PXE booted the system, did an http install and then finished up customizations.  It’s set up for a serial console (all servers should – or use a service processor) and it would not get past loading grub stage2 and printing “Press any key to continue” a few times.  If I hit enter over the serial console it would proceed to boot, but would not reboot on its own.

I have installed CentOS on several other servers the past year using serial consoles (either actual serial ports or service processors) without a problem.  CentOS by default puts the following line in grub’s menu.lst file.

terminal --timeout=5 serial console

However for this system I had to change it to the following to get it to reboot on its own.

terminal --timeout=5 console serial

After that it reboots by itself fine.

Sparse zones not supported in OpenSolaris!

March 15th, 2010

Zones (or containers) were first introduced in Solaris 10.  Once ZFS came out (using a zfs dataset per zone is much better than using UFS with zones) deploying multiple zones became a great way to provide several virtual machines on servers.  One of the features that I liked best about zones Sun called “sparse root zones” where most software is shared between the global (bare metal) zone and the non-global zones.  This was done by having read-only loopback mounts for /lib, /platform, /sbin and /usr (I also added /usr/local to the list, others may want /opt that way), which saved lots of disk space and makes maintaining zones a snap.  Using sparse root zones is optional and you could also do the other extreme where every zone has local copies of those mount points which is called a whole root zone.  Or maybe just /usr needs to be local and keep the others as read-only loopback mounts, the administrator could decide.

The OpenSolaris distribution does not support sparse root zones!  All zones need to be whole root zones.  This is because of the IPS package manager (pkg command).  See my previous blog entry for what I think of package managers.  This is a major defect and has been since IPS was added.  The pkg command cannot handle managing software in all the zones on a system from the global zone is the reason sparse root zones are not supported.  Well, I don’t want it to.  I just want the software that is installed to be available with read-only loopback mounts in the zones.  The supported way is to maintain packages in each zone (treating each one like a little Linux machine essentially).  That would be fine if you have a need for such a setup, but I want to have 20-30 zones on a system with all the same /lib, /sbin, /platform and /usr, so there is no need to waste time managing #!$*&^# packages in each one.

ZFS dedup could be used to save space, when having several zones, but that’s not the same as each zone still needs to be managed and updated separately.  Have I mentioned I hate package managers?  Sparse zones work great, were a great idea and it goes against the Sun way to remove things.  One of the great things about Solaris is backwards compatibility, but with OpenSolaris the idea is to be more like Linux (in good and bad ways) by breaking what used to work great!

Oracle listen up.  Sparse root zones are a great feature and should not be dropped from the production release that follows Solaris 10.  They should be added back to OpenSolaris.  Do a quick web search for “opensolaris sparse zones not supported”  or check the zones mailing list and see how people have been complaining about this for quite awhile.

There is a work around.  Simply edit the xml config file (the one with the dire warning to not edit – I have the tendency to like vi better than configuration editors and edit files like that often) and add in the sparse root zone entries (inherited-pkg-dir).  The zonecfg config will not allow you to add  inherited-pkg-dir entries anymore (it even refused to add my /usr/local entry which pkg doesn’t need to be concerned with.  I could have added /usr/local a different way though using ‘filesystem special=…’.).  However, the backend will still parse the entries and the directories get mounted as desired.  Then shutdown the zone and remove the files from /lib, /platform, /sbin and /usr to save lots of space, reboot the zone and test that things work.

I did some research on this first and other people suggested the same thing.  People then reply saying that it is not supported and that pkg will not work and unpredictable things will happen and…  Well, if I never run pkg in the zone what will break?  I do not need to run pkg in the zones.  I know how to manage things correctly myself better.  As long as the backend doesn’t get broken it should be fine.

You say that pkg needs to be ran to update things in /etc and /var or what ever.  Yeah, whatever.  I never used the SVR4 package commands either and I could update systems running over 20 zones without a problem.  The only things I found that I ever had to update (between several different upgrades) were SMF things in /var/svc/ and /etc/svc/.   I think that will continue to work without running pkg (I much prefer sdist or rsync to keep files updated and in sync).

Another thing I found out that was different in OpenSolaris was that zones require two ZFS datasets instead of just one as I was used to (the root directory gets its own dataset now).  It took me a little while to figure out how to bring up a new zone by hand.  I prefer to know how things work and be able to do it by hand rather than just run some command to magically do everything.  It turned out that the zfs dataset for the root filesystem need a couple of special entries.

One of the strangest things about this new root filesystem for zones was that it a legacy mount.  I was curious how they get mounted.  I unmounted the root filesystem from my test zone, disabled the zone service and then re-enabled it to find the filesystem was mounted.  I looked at the SMF startup script for the zones service and found that zoneadm was being called with the sysboot option for each zone.  That option isn’t documented in the help for zoneadm or the man page.  So I unmounted a zone root and ran zoneadm with the sysboot option for the zone to find the filesystem gets mounted.  That’s cool I thought.  Since the sysboot option is undocumented it probably shouldn’t be depended on as it may change.  I checked the zoneadm source and a comment says this is an undocumented interface for use from SMF.

Here’s an example of what a zone looks like now.

[/zones/template/]# df -h . root
Filesystem             size   used  avail capacity  Mounted on
pool1/zones/template   47G   113M   911M    12%    /zones/template
pool1/zones/template/ROOT/zbe 47G   115M    45G     1%    /zones/template/root

And here are the local zfs entries.

# zfs get all pool1/zones/template pool1/zones/template/ROOT/zbe | grep local
pool1/zones/template/ROOT/zbe  canmount                        noauto                                local
pool1/zones/template/ROOT/zbe  org.opensolaris.libbe:active    on                                    local
pool1/zones/template/ROOT/zbe  org.opensolaris.libbe:parentbe  someUUID  local

Notice the zbe dataset that gets mounted as the zone root now.  The someUUID part is important (otherwise the zone will not boot).  To get that value install a zone with zonecfg the supported way and use the value it creates.  So the way I’m going to deploy zones on OpenSolaris systems is install one the supported way, wait awhile so it can stupidly download all the packages from a repository, set it up, then halt, setup the sparse root and reboot.  After setting up one then create others by hand by creating clones of the zfs datasets, etc.

I also tested bring up a zone root from SXCE b130 on OpenSolaris b134 and it worked fine.  I updated the SMF stuff so it was in line with b134.  I had to mess with the zfs parameters I mentioned above for it to boot after creating a data set for root.

Anyway, I think I have found a way that I can support sparse root zones on OpenSolaris even if they are not officially supported.  I think using dedup may be useful for things that need to be local for zones, but I prefer the read-only loopback mounts for software in most cases.  The IPS package system needs to grow up and learn to deal with sparse root zones or just don’t use it!

On to OpenSolaris

March 12th, 2010

I’ve been working the past couple weeks testing out the preview releases of the next OpenSolaris distribution (b133 and b134).  While I am the leader of the Delaware OSUG, have used the live CD, given away lots of live CDs and tested OpenSolaris 2009.06 in VirtualBox I have been reluctant to go to the OpenSolaris distribution on “production” systems.  Since the last release of Solaris Express Community Edition (SXCE) was b130 a while back it is now time to make the leap from Solaris to OpenSolaris on our x86 systems.  I’ll go over some of the issues I’ve ran into here.

First, I despise all package managers.  I much prefer to download the source, configure the way I want, compile and install in /usr/local/.  However, package managers are a necessary evil for a distribution.  Yes, the SVR4 package manager had some problems, but after doing a full install of Solaris or SXCE I removed packages I didn’t want installed and I didn’t need to deal with it again so I didn’t have a problem with it.  I guess I’m kind of old school in that I prefer to do things by hand, understand and know what’s going on rather than just type something and be glad it works.  The OpenSolaris IPS (pkg command) is very Linux like and Sun has succeeded in making OpenSolaris Linux like as a result (good and bad).  While pkg has some nice features it is not ready for a production system, which makes the OpenSolaris distribution not ready for production enterprise systems.

I planned on looking into OpenSolaris around now following the announcement of SXCE coming to an end.  The next “stable” release was to be 2010.02, but now should be 2010.03 with a scheduled release two weeks from today.  I started looking at the preview ISO of b133 from genunix.org and then tested upgrading to b134.  After some hurdles I have it mostly working at this point.

The first thing I didn’t like was that everything gets installed in / with no opportunity to create a separate /var even.  So I began experimenting with boot environments (beadm command).  Boot environments are pretty cool and I had been doing something similar by hand previously on SXCE and Solaris 10.  The beadm command does make handling multiple environments easier and more convenient.  However it doesn’t have a way to split / into multiple datasets that I could see so I did that by hand and found that creating a separate /var works fine.  However, creating a separate /usr no longer works (it did for SXCE through b130), so I have to give that up.  I like having separate filesystems for /, /var and /usr and have always done that on SunOS, Solaris, IRIX and Linux.  The trend has been to go to a single /, but the ZFS way is to create a dataset whenever and where ever you you want or need one so this seems backwards…  Again I’m somewhat old school here in liking /usr separate, but at least I have /var in a separate dataset.  The beadm command works fine to create new BEs after the one you’re cloning has multiple datasets, so that is cool.  Also, I do not see a lot of reason to have swap space on a ZFS zvol and I prefer to use swap partitions still.  I like to create the root pool as a mirror and I’d rather have separate swap devices on each disk rather than mirror swap…  It would be nice to checksum swap, though.

I went through the services and disabled things that I do not want enabled pretty similar to SXCE.  I also removed some packages with pkg and had to also added some (like nis/server and the standard header files – why are they not installed by default???) and doing other normal configurations like adding site packages.    I also did my standard security hardening very similar to SXCE (remove suid bits from programs that we don’t need them set for – but that breaks the packaging! boo hoo).

I started off testing on a desktop and found I had to add the following to /etc/user_attr to get gdm to have the shutdown and reboot buttons.

gdm::::type=normal;auths=solaris.system.shutdown

I took a look at the Automated Installer (AI) and decided that I’ll continue without it for now.  I have no problems cloning a boot environment with gtar as I’ve been doing that for many years.  I moved on to testing out b133 more on a Sun Fire X4600.  Many of the problems I ran into dealt with enabling xvm (xen).  I created a new boot environment for having xvm enabled (I liked how SXCE did this a little better).  I ended up not enabling milestone/xvm as that insisted on breaking my grub menu.lst file.  So I just enabled the xvm services and the hypervisor worked, except the clock was off on dom0 by five hours and ntpd kept going into maintenance mode.

Turns out there is an open bug on the ntpd/xvm problem, http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6908973

I found that the work around was to do ‘rtc -z US/Eastern’ prior to doing ntpdate in the SMF script and then adding ‘rtc -z UTC’ after starting the daemon.  This works somehow, except some services that start up in between the two calls to rtc start up five hours off so I may need to look at adding some dependencies there.  Also, we have a myri10ge 10 gig ethernet card in this X4600.  The performance of it was terrible in b133, but I found that there were a couple of bugs fixed related to it in b134 and used ‘pkg image-update’ to update to b134, which fixed the performance issues.  Also, I found that pkg will follow symlinks that you put in (I know don’t mess with the pkg manager!).  I would consider that a bug because I expected it to remove the symlink and re-install the file like SVR4 packaging would.  I replace /bin/passwd with a symlink to a replacement I wrote that doesn’t need to be suid root and works for our environment and pkg turned on the suid bit!

I also ran into the bmc device driver not being included because of legal issues, http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6799081

The bmc driver is used by ipmitool to access the service processor, which we run daily in a cron job to check the service processor logs.  I fixed this by adding the bmc driver from b130 as the bug report mentions and running ‘add_drv bmc; devfsadm’.  I wonder how many builds this will work for?  The other workaround mentioned is going over the network interface instead…

The other major complaint with OpenSolaris is that sparse root zones are not supported.  I found a solution to that and I’ll save the details of the complaint and solution for a blog entry next week.

I think I now have the X4600 mostly set up with OpenSolaris to the point where we can deploy it once 2010.03 is released.  I may be missing some needed packages though, #!&#!! package manager!  After clearing these hurdles I think OpenSolaris will be fine for us, but there is something nice about Solaris even if some of the packages aren’t open source.  I’m all for open source, but having something that works is important.  I think Oracle needs to release something that is more up to date than Solaris 10 (more recent features), but OpenSolaris is not ready for enterprise customers yet.  I still need to test if the Sun Ray server software works…  I also need to test domU xvm domains, but I’m assuming they should work ok.

MPICH2 1.2.1 compile on Solaris

February 2nd, 2010

I recently upgraded mpich2 to version 1.2.1 on Solaris systems (sparc and x86).  There were some problems initially where some header files were not found.  I eventually fixed the problem by fixing a bashism that was in one of the configure scripts.  In src/mpid/ch3/configure I changed the line

export MY_EXPORTED_LIBS="$LIBS"

to be the two lines

MY_EXPORTED_LIBS="$LIBS"
export MY_EXPORTED_LIBS

I found this out after doing some web searches which led me to

https://lists.mcs.anl.gov/mailman/htdig/mpich-discuss/2010-January/006388.html

which offered that fix.   After that fix it compiled fine.  I use the Sun Studio compilers and here is the configure line:

./configure --prefix=/usr/local/mpich2 --enable-sharedlibs=solaris-cc --enable-threads --enable-cxx --disable-f77 --disable-f90

I tested that “hello world” worked on multiple clients for all three platforms (Solaris sparc and x86 plus Linux x86_64) I compiled it on.