open source virtualization notes from no cloud in particular: Quick Provisioning with KVM and Xen

One of the most compelling features of virtualization for me is the speed and flexibility in provisioning systems. There are a number of logical steps you can perform quite easily in order to get systems up really quickly. Just this week on an average KVM host with 10k RPM disks I brought up a minimal Fedora 13 VM ready to go in DNS on the network in just over 30 seconds: 14 to build, 18 to boot. It still brings a smile to my face. :) So, let's get started.

There are two basic components to a quick provisioning process:

1. Source preparation - This is the piece that takes an existing system and packages it up (what I'll call bundling) such that it can be quickly placed onto the disk of new VMs. As you'll see, I use trusty old tar to do this.

2. VM Build Process - This is the big piece that does "the rest." Everything that a new VM needs will be done here in a completely automated "one click" process. At a high level, MACs and IPs will be assigned, disk device(s) will be allocated, filesystem(s) will be made, the bundle will be extracted onto the new filesystems, local system files will be modified, DNS will be updated, and the VM will be booted.

Source Preparation

This section is going to be short. Once I have a machine in a state where it's ready to be used as a template for future VMs, I run a bundle process on it. The meat of the script to perform this is:

bundle=/tmp/$name.tar.gz
cd /
tar czpf $bundle --exclude "./tmp/*" \
  --exclude ./lost+found \
  --one-file-system . 2>&1 | grep -v 'socket ignored'

I can get away with this because I build my template systems on a single filesystem to make this process a snap. The operating system goes in one filesystem and data will be mounted separately (likely with logical volume management) so the --one-file-system option will ignore it. Simply adjust the tar command if you have broken out /var, /boot, and others so that you don't miss any required files.

The actual script that I use to perform this process does more than just this, but not all that much. Besides spending time getting options, printing usage information, and providing a means to update itself, it transfers the resulting tar file up to a web service that I have running on a host serving as a bundle repository. That way, my bundles get "published" and can be used across the hosting environment. Note that one large side benefit of the script's simplicity is that bundles can be used for both Xen and KVM virtual machines...it will be up to the provisioning process to make the minor adjustments to account for hypervisor differences.

VM Build Process

As I just mentioned, the bundles I create can be used with both Xen and KVM. When using libvirt as a middleman between configuration and hypervisor there are very few differences in the provisioning processes. I'm pretty anal so I do have separate programs to do both Xen and KVM, but that's by no means required and I probably will just combine the two at some point.

One of the differences between Xen and KVM is that Xen has the ability to assign logical volumes as individual disk partitions in the guest VM. For example, /dev/vg_xen/testvm-root can be presented to the VM named testvm as /dev/xvda1. This is great, because a filesystem can be created on that volume on the Xen host, quickly mounted up, and then the bundle can be untarred directly into it. When the VM boots, it'll see that filesystem as /dev/xvda1, its root filesystem.

From my research and testing, KVM does not have this ability. Instead, logical volumes can only be presented as whole disks, so /dev/vg_kvm/testvm-vda could only be mounted as /dev/vda in the VM. This presents the (minor) annoyance of having to partition the disk from the KVM host first and then playing the kpartx game before unbundling. I'll do that in the example below since this method could also be used for Xen.

Ok, with that out of the way, let's get down to it. I find that it's easiest to start with a list of all the information I'll need for a fully functional VM and then go from there:

VM name (obviously)
MAC address: Not absolutely required but I like to assign these programmatically.
IP address: Again, not absolutely required since you could just use DHCP, but I like to assign these and possibly push the assignments to DHCP.
Bridge: You'll need to know what bridge to use for your VM's network.
Disk Device: This piece can be huge ... it can be something as simple as a file path or it can become a big process to generate an iscsi lun on a remote server and connect to it from the provisioning host. In this example, I'll use a local logical volume group. In future blog entries I'll probably talk a lot about iSCSI.
CPU count
Memory

For the purposes of this blog entry, I'm going to chop the stuff that isn't necessarily all that interesting: mac/ip/bridge assignment and cpu/memory definition. So for simplicity I'll take these items as variables used by a bash function named vm_build. I'll just spit out the function (from the KVM build process) in chunks and discuss along the way. Where my "real" scripts do more I'll try to make a comment.

vm_build()
{
  vm=$1

  ### do name validation in real script

  echo
  echo "=== Creating new linux domain: $vm ==="

  echo Bundle: $bundle
  echo Memory: $mem
  echo CPUs: $cpu
  echo MAC: $mac
  echo IP:  $ip
  echo Bridge:  $bridge
  echo Volume Group:  $vg
  echo Root Filesystem Size: $size_root
  echo Swap Size: $size_swap
  echo Filesystem Type: $fs
  echo Boot on Completion: $boot

  xml=/etc/libvirt/qemu/${vm}.xml
  sed "s/\$NAME/$vm/" $BASE_XML | \
  sed "s/\$MEM/$mem/" | \
  sed "s/\$CPU/$cpu/" | \
  sed "s/\$IP/$ip/" | \
  sed "s/\$BRIDGE/$bridge/" | \
  sed "s/\$MAC/$mac/" > $xml

This is just displaying the variables that I'll have defined by the time it's ready to build a VM. You could take this as command-line options or have functions to pick them or, like I do, use a mixture of both. I generally take bundle, memory, cpu, size_root, size_swap, and fs as command-line options since they're a unique decision for each VM. For the network info and disk volume, that gets programmatically generated since it would be annoying to manually create unique MACs and manually pick open IP addresses. :)

Once those variables are in place, start feeding them into a template libvirt xml file (defined as $BASE_XML) which we'll use to define the VM. Here's a link to this template:

base_kvm_linux.xml

Now to the dirty part ... the actual disk provisioning.

  echo creating logical volumes...

  lv_size=$((size_root + size_swap))

  lv_root=/dev/$vg/$vm
  lvcreate -L ${lv_size}m -n $lv_root
  if [ $? -ne 0 ]; then
    echo "error: could not create root volume for $vm...aborting!"
    return 3
  fi
  sed -i "s@\$ROOT@$lv_root@" $xml

  parted -s $lv_root mktable msdos
  parted -s -- $lv_root mkpart primary ext2 0 $size_root
  parted -s -- $lv_root mkpart primary linux-swap $size_root -1
  parted -s -- $lv_root set 1 boot on

  # brutal hack workaround since parted creates weird mapper entries
  mapper_name=`echo $vm | sed 's/-/--/g'`
  dmsetup remove /dev/mapper/*${mapper_name}*p1
  dmsetup remove /dev/mapper/*${mapper_name}*p2

  kpartx -a -p P $lv_root
  part_root=/dev/mapper/${vm}P1
  part_swap=/dev/mapper/${vm}P2

  mkswap $part_swap

  echo building root filesystem...

  mkfs.$fs -q $part_root
  mkdir -p /mnt/${vm}
  mount $part_root /mnt/${vm}

  echo extracting operating system...
  tar -C /mnt/$vm -xzf $bundle
  sync # gotta sync otherwise the grub will occasionally fail???

The above allocates a logical volume, partitions it, plays device mapper games, mounts it locally on the KVM host, creates a filesystem (type specified by $fs - I use ext3 and ext4 as needed) on one partition and mounts it, makes swap on the other, and finally extracts the bundle. It's a lot of work for really not all that much activity. As I mentioned before, with Xen this *can* be simplified to create two logical volumes: one for root and one for swap. The mkfs will run on the root logical volume and that volume will be mounted. It eliminates all the partitioning steps and device mapper complexity.

From here, we go to the local modifications. For simplicity's sake again, I'll just assume that we only build Red Hat based systems. Debian would have its configuration files in other locations.

  # update the hostname
  sed -i "s/HOSTNAME=.*/HOSTNAME=${vm}.${dns_domain}/" \
    /mnt/$vm/etc/sysconfig/network

  # change the grub boot device
  sed -i 's@root=[^ ]\{1,\} @root=/dev/vda1 @' \
    /mnt/$vm/boot/grub/grub.conf

  # create a base fstab using the virtio vda devices
  cat >/mnt/$vm/etc/fstab <<+++
/dev/vda1 / $fs defaults,noatime 1 1
/dev/vda2 swap swap defaults 0 0
tmpfs  /dev/shm tmpfs   defaults       0 0
devpts /dev/pts devpts  gid=5,mode=620 0 0
sysfs  /sys     sysfs   defaults       0 0
proc   /proc    proc    defaults       0 0
+++

  # clear out the MAC address
  sed -i '/^HWADDR/d' \
    /mnt/$vm/etc/sysconfig/network-scripts/ifcfg-eth0
  sed -i "s/^IPADDR=.*/IPADDR=$ip/" \
    /mnt/$vm/etc/sysconfig/network-scripts/ifcfg-eth0

  # for now, just remove persistent rules files
  rm -f /mnt/$vm/etc/udev/rules.d/70-persistent-net.rules

  # disable selinux
  if [ -f /mnt/$vm/etc/sysconfig/selinux ]; then
    echo 'SELINUX=disabled' > /mnt/$vm/etc/sysconfig/selinux
  fi

Mostly it's just a matter of making sure that the system boots up on the device partitions that KVM is presenting (Xen will use xvda rather than vda) and that our eth0 network interface comes up with its own MAC address and IP. SELinux is disabled since it will hose stuff up in the event that our template VM had it enabled.

Now we need to make sure that the new VM will actually boot with grub. In Xen this is not at all required since it uses the beautiful concept of pygrub, but as far as I am aware KVM has no such ability and it requires that a boot loader be installed on each VM's root volume.

  mount --bind /dev /mnt/$vm/dev
  mount -t proc none /mnt/$vm/proc
  
  cat >/mnt/$vm/vm-grub <<EOF
#!/bin/bash
ln -s $part_root ${lv_root}1
grub <<+++
device (hd0) $lv_root
root (hd0,0)
setup (hd0)
+++
rm -f ${lv_root}1
EOF

  chmod 755 /mnt/$vm/vm-grub
  echo chroot /mnt/$vm /vm-grub
  chroot /mnt/$vm /vm-grub

  umount /mnt/$vm/proc
  umount /mnt/$vm/dev
  umount /mnt/$vm
  rm -rf /mnt/$vm

  kpartx -d -p P $lv_root

If there is a better simpler way to do this, please, I am all ears. I struggled through this piece the longest in the conversion of my Xen provisioning scripts to KVM and would really love to clean this up. I mean, I love taking any chance I can to use chroot and all, but one command sure would be nice. :)

Anyway, finally, we get to the definition of the VM and its registration with DNS.

  echo -n Defining VM...
  virsh define $xml &>/dev/null
  if [ $? -eq 0 ]; then
    echo "success!"
    echo Configuring VM to autostart...
    virsh autostart $vm
    if [ "$boot" -eq 1 ]; then
      virsh start $vm
    fi
  else
    echo "failure!"
  fi

  ### call DNS registration in the real script

  echo
  echo "=== Creation of $vm complete ==="
}

And that's it! Not so bad, eh? :) The VM is built and started up if desired. When using ext4 and a compressed bundle of 400MB or so, the fastest I've provisioned a machine on 10k RPM disks was 14 seconds. I'd love to bring this number down to single digits so perhaps a nice RAID of 15k RPM disks is in my future.

I hope this proves useful and I'd absolutely love to receive any and all feedback. Until next time!

open source virtualization notes from no cloud in particular

Sunday, September 5, 2010

Quick Provisioning with KVM and Xen

Source Preparation

VM Build Process

1 comment: