Booting With PXE Linux And Frisbee

Booting with PXE and Frisbee

Although their OS resides on their local drives, Chimera's nodes are configured to boot from the network as the beginning of a multi-stage boot process. This is to permit easy implementation of alternate boot schemes and easy disk re-imaging. This page describes the boot sequence.

PXE boot and DHCP

When a node completes it's POST, it starts the PXE boot process. A DHCP request is sent and answered by the head node. The client receives it's IP address and is told to download an OS via TFTP from the head node (which is running a TFTP server, of course). The relevant dhcpd.conf sections (including a typical node configuration) are as follows:

# For first onboard gigE
subnet 10.20.2.0 netmask 255.255.255.0
{
  option domain-name-servers 10.20.2.10;
  option routers 10.20.2.10;
  option broadcast-address 10.20.2.255;
  option log-servers 10.20.2.10;
  option ntp-servers 10.20.2.10;
  next-server 10.20.2.10;
  filename "pxelinux.0";
}
host node00
{
  hardware ethernet 00:30:48:fe:9b:14;
  fixed-address node00;
}

PXELinux

As shown in the above configuration, the node first requests "pxelinux.0" (found in /tftpboot on Chimera's head) from the TFTP server. PXELinux is a small Linux boot-loader that obtains its configuration information via TFTP, so it can be easily reconfigured as desired. Full documentation may be obtained at http://syslinux.zytor.com/pxe.php.

Configuration files are located in /tftpboot/pxelinux.cfg. If no more specific configuration file exists, PXELinux will load the "default" file. This file configures four possible boot modes that will be described in more detail below under Frisbee. The boot may be chosen by entering it's name at the PXELINUX prompt. If no mode is chosen, it will continue loading the configured DEFAULT mode, which is to boot from the local disk.

Frisbee

The three other PXELinux boot modes (throw, catch, and play) are for interacting with the Frisbee disk imaging system. All three load a Linux kernel and an initramfs image via tftp. (Note: the kernel and initramfs are not particularly optimized for size; if re-imaging will be frequent, they probably can and should be reduced). The kernel is the standard CentOS kernel. The initramfs contains Busybox, several support binaries and libraries, Netcat (nc), the Frisbee disk imaging client, and an init script which invokes them according to arguments provided to the kernel.

Kernel arguments

The following kernel arguments are processed by the init script in the Frisbee initramfs image:

  • mode should be one of:
    • catch: receive a Frisbee image from the server
    • throw: send a Frisbee image to the server
    • play: start a shell in the init environment
  • disk. Disk device to catch or throw (e.g., /dev/sda, or /dev/sda1)
  • server. IP address of Frisbee server
  • port server port number from which to catch or throw

Modes of operation

As mentioned, there are four modes of operation which may be selected from the PXELinux prompt including local (default to have the node boot from the local disk.

Play

This mode is included primarily for debugging purposes. After setting up the network, a shell is executed in the initramfs environment.

Throw

This mode is for copying a disk image to the server. Networking is set up. The Frisbee disk archiver, imagezip then pipes the desired disk or partition to the server via netcat (nc). Naturally, the server must first be listening:

nc -l 9999 > /frisbee/node00.img-2011-01-22

An example port of 9999 is used, and output is redirected to the appropriate disk image.

Catch

In this mode, the client will request a disk image from the server's frisbeed. In most cases, many clients will be requesting the image at the same time. Naturally, a frisbeed must be running before the clients can connect:

/frisbee/frisbeed -W 500000000 -dd -p 9999 -m 224.0.0.1 /frisbee/node00_sda1-2011-01-22.img

The "-W" indicates the rate limit in bytes/second. The port number is provided after "-p". "-dd" provides copious debugging (primarily as a way of telling that something is going on). "-m" indicates the multicast address to use (addresses other than 224.0.0.1 don't appear to work, which would be a problem on a shared network). The final argument is the imagezip image to send to clients.