Zones (or containers) were first introduced in Solaris 10. Once ZFS came out (using a zfs dataset per zone is much better than using UFS with zones) deploying multiple zones became a great way to provide several virtual machines on servers. One of the features that I liked best about zones Sun called “sparse root zones” where most software is shared between the global (bare metal) zone and the non-global zones. This was done by having read-only loopback mounts for /lib, /platform, /sbin and /usr (I also added /usr/local to the list, others may want /opt that way), which saved lots of disk space and makes maintaining zones a snap. Using sparse root zones is optional and you could also do the other extreme where every zone has local copies of those mount points which is called a whole root zone. Or maybe just /usr needs to be local and keep the others as read-only loopback mounts, the administrator could decide.
The OpenSolaris distribution does not support sparse root zones! All zones need to be whole root zones. This is because of the IPS package manager (pkg command). See my previous blog entry for what I think of package managers. This is a major defect and has been since IPS was added. The pkg command cannot handle managing software in all the zones on a system from the global zone is the reason sparse root zones are not supported. Well, I don’t want it to. I just want the software that is installed to be available with read-only loopback mounts in the zones. The supported way is to maintain packages in each zone (treating each one like a little Linux machine essentially). That would be fine if you have a need for such a setup, but I want to have 20-30 zones on a system with all the same /lib, /sbin, /platform and /usr, so there is no need to waste time managing #!$*&^# packages in each one.
ZFS dedup could be used to save space, when having several zones, but that’s not the same as each zone still needs to be managed and updated separately. Have I mentioned I hate package managers? Sparse zones work great, were a great idea and it goes against the Sun way to remove things. One of the great things about Solaris is backwards compatibility, but with OpenSolaris the idea is to be more like Linux (in good and bad ways) by breaking what used to work great!
Oracle listen up. Sparse root zones are a great feature and should not be dropped from the production release that follows Solaris 10. They should be added back to OpenSolaris. Do a quick web search for “opensolaris sparse zones not supported” or check the zones mailing list and see how people have been complaining about this for quite awhile.
There is a work around. Simply edit the xml config file (the one with the dire warning to not edit – I have the tendency to like vi better than configuration editors and edit files like that often) and add in the sparse root zone entries (inherited-pkg-dir). The zonecfg config will not allow you to add inherited-pkg-dir entries anymore (it even refused to add my /usr/local entry which pkg doesn’t need to be concerned with. I could have added /usr/local a different way though using ‘filesystem special=…’.). However, the backend will still parse the entries and the directories get mounted as desired. Then shutdown the zone and remove the files from /lib, /platform, /sbin and /usr to save lots of space, reboot the zone and test that things work.
I did some research on this first and other people suggested the same thing. People then reply saying that it is not supported and that pkg will not work and unpredictable things will happen and… Well, if I never run pkg in the zone what will break? I do not need to run pkg in the zones. I know how to manage things correctly myself better. As long as the backend doesn’t get broken it should be fine.
You say that pkg needs to be ran to update things in /etc and /var or what ever. Yeah, whatever. I never used the SVR4 package commands either and I could update systems running over 20 zones without a problem. The only things I found that I ever had to update (between several different upgrades) were SMF things in /var/svc/ and /etc/svc/. I think that will continue to work without running pkg (I much prefer sdist or rsync to keep files updated and in sync).
Another thing I found out that was different in OpenSolaris was that zones require two ZFS datasets instead of just one as I was used to (the root directory gets its own dataset now). It took me a little while to figure out how to bring up a new zone by hand. I prefer to know how things work and be able to do it by hand rather than just run some command to magically do everything. It turned out that the zfs dataset for the root filesystem need a couple of special entries.
One of the strangest things about this new root filesystem for zones was that it a legacy mount. I was curious how they get mounted. I unmounted the root filesystem from my test zone, disabled the zone service and then re-enabled it to find the filesystem was mounted. I looked at the SMF startup script for the zones service and found that zoneadm was being called with the sysboot option for each zone. That option isn’t documented in the help for zoneadm or the man page. So I unmounted a zone root and ran zoneadm with the sysboot option for the zone to find the filesystem gets mounted. That’s cool I thought. Since the sysboot option is undocumented it probably shouldn’t be depended on as it may change. I checked the zoneadm source and a comment says this is an undocumented interface for use from SMF.
Here’s an example of what a zone looks like now.
[/zones/template/]# df -h . root
Filesystem size used avail capacity Mounted on
pool1/zones/template 47G 113M 911M 12% /zones/template
pool1/zones/template/ROOT/zbe 47G 115M 45G 1% /zones/template/root
And here are the local zfs entries.
# zfs get all pool1/zones/template pool1/zones/template/ROOT/zbe | grep local
pool1/zones/template/ROOT/zbe canmount noauto local
pool1/zones/template/ROOT/zbe org.opensolaris.libbe:active on local
pool1/zones/template/ROOT/zbe org.opensolaris.libbe:parentbe someUUID local
Notice the zbe dataset that gets mounted as the zone root now. The someUUID part is important (otherwise the zone will not boot). To get that value install a zone with zonecfg the supported way and use the value it creates. So the way I’m going to deploy zones on OpenSolaris systems is install one the supported way, wait awhile so it can stupidly download all the packages from a repository, set it up, then halt, setup the sparse root and reboot. After setting up one then create others by hand by creating clones of the zfs datasets, etc.
I also tested bring up a zone root from SXCE b130 on OpenSolaris b134 and it worked fine. I updated the SMF stuff so it was in line with b134. I had to mess with the zfs parameters I mentioned above for it to boot after creating a data set for root.
Anyway, I think I have found a way that I can support sparse root zones on OpenSolaris even if they are not officially supported. I think using dedup may be useful for things that need to be local for zones, but I prefer the read-only loopback mounts for software in most cases. The IPS package system needs to grow up and learn to deal with sparse root zones or just don’t use it!