Saturday, September 11, 2010

KVM Installation on RHEL5 / CentOS5 and RHEL6

Since my last blog entry, I've received a couple emails asking about getting started with KVM. Compared to Xen, which requires a slight change in mindset considering that it's actually its own kernel, the KVM server install process is actually quite painless. Install the packages, configure your bridge, and away you go.

Package installation on Red Hat Enterprise / CentOS 5.4+

At a minimum, you'll need the KVM group and the kvm-tools package, but some others can be incredibly handy. libguestfs is a "library for accessing and modifying virtual machine disk images." Its big tool is guestfish which allows for a whole host of interactive VM fun. The author works for Red Hat and has a great blog which I've been following for some time: Richard WM Jones. EPEL is the Extra Packages for Enterprise Linux fedora project. It brings a massive number of useful fedora packages to RHEL and CentOS. I couldn't live without it any more.

yum -y groupinstall KVM
yum -y install kvm-tools libguestfs-tools virt-top e4fsprogs
rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-4.noarch.rpm

Package installation on Red Hat Enterprise 6 (Beta 2 as of this writing)

Red Hat has changed it up a bit in RHEL6. I'm not sure why the KVM stack isn't in a single package group (perhaps I'm missing something), but it's still easy enough. Note that, at least in the beta, the repo needs to be enabled after OS installation. Otherwise, yum commands will report that there are no packages available. Also keep in mind that, once the full version of RHEL6 has been released, there may be changes. The base repo will likely be enabled by default and the epel url will no doubt change.

# enable the rhel-beta repo
vim /etc/yum.repos.d/rhel-beta.repo
# set enabled=1 in the rhel-beta block
# fyi, I also enable rhel-beta-optional
yum groupinstall -y 'Virtualization Platform' \
  'Virtualization Client' \
  'Virtualization Tools'
yum -y install python-virtinst libguestfs-tools virt-top \
  e4fsprogs qemu-kvm-tools
rpm -Uvh http://download.fedora.redhat.com/pub/epel/beta/6/x86_64/epel-release-6-4.noarch.rpm

Bridging Configuration

Every libvirt-enabled system I've used has installed a NAT forwarding "virtual network" by default. In server scenarios I have never had a use for this so I remove it first.

# if you haven't rebooted since the package install
# you'll need to start libvirtd
service libvirtd start
virsh net-list
virsh net-destroy default
virsh net-undefine default
virsh net-list

The next step is to create the bridge that will be presented to your guests. Since it's rather common these days to bond multiple network interfaces together for (at the very least) redundancy, I'll also show that piece here. The network interface "flow" for a single adapter will look like this now: eth0 -> bond0 -> br0. Obviously, you'll want at least a second adapter, but if you get started this way with only one it will be very easy to add the second.

I actually do all this with a custom script out of my kickstarts, but getting the process down manually can be extremely helpful.

# create the bridge
cat > /etc/sysconfig/network-scripts/ifcfg-br0 <<+++
DEVICE=br0
TYPE=Bridge
BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.0.7
NETMASK=255.255.255.0
+++

# create the bond and add it to the bridge
cat > /etc/sysconfig/network-scripts/ifcfg-bond0 <<+++
DEVICE=bond0
ONBOOT=yes
BRIDGE=br0
BONDING_OPTS='mode=1 miimon=100'
+++

Now all that remains is to update the individual ethernet adapter configuration files to become slaves of the bond. You'll need at least one interface to be part of the bond. :) A sample file could look like this:

# /etc/sysconfig/network-scripts/ifcfg-eth0
# be sure to keep the correct HWADDR
DEVICE=eth0
HWADDR=00:19:B9:F3:F3:F3
ONBOOT=yes
MASTER=bond0
SLAVE=yes

The next step is to restart your network and validate the changes. A serial console comes in very handy for situations like these where you'll (at least briefly) lose network connectivity. In the worst case, if the configuration is botched, you'll be off the network until you fix it. That can be slightly problematic with remote servers. :)

# restart the network from the serial console
# you *did* configure a serial console, right? :)
root# service network restart
root# ip -4 -o addr
1: lo    inet 127.0.0.1/8 scope host lo
7: br0    inet 192.168.0.7/24 brd 192.168.0.255 scope global br0

# examine your bond
# note that I haven't plugged in eth1 yet
root# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:19:b9:F3:F3:f3

Slave Interface: eth1
MII Status: down
Link Failure Count: 1
Permanent HW addr: 00:19:b9:F4:F4:F4

From here, I do like to reboot in order to make sure everything will come up cleanly before relying on it. I'm a paranoid guy, though, and it's not absolutely necessary. :) Otherwise, you should be ready to install and run some guests. I hope this helps so please let me know either way. Until next time!

Sunday, September 5, 2010

Quick Provisioning with KVM and Xen

One of the most compelling features of virtualization for me is the speed and flexibility in provisioning systems. There are a number of logical steps you can perform quite easily in order to get systems up really quickly. Just this week on an average KVM host with 10k RPM disks I brought up a minimal Fedora 13 VM ready to go in DNS on the network in just over 30 seconds: 14 to build, 18 to boot. It still brings a smile to my face. :) So, let's get started.

There are two basic components to a quick provisioning process:

1. Source preparation - This is the piece that takes an existing system and packages it up (what I'll call bundling) such that it can be quickly placed onto the disk of new VMs. As you'll see, I use trusty old tar to do this.

2. VM Build Process - This is the big piece that does "the rest." Everything that a new VM needs will be done here in a completely automated "one click" process. At a high level, MACs and IPs will be assigned, disk device(s) will be allocated, filesystem(s) will be made, the bundle will be extracted onto the new filesystems, local system files will be modified, DNS will be updated, and the VM will be booted.

Source Preparation

This section is going to be short. Once I have a machine in a state where it's ready to be used as a template for future VMs, I run a bundle process on it. The meat of the script to perform this is:

bundle=/tmp/$name.tar.gz
cd /
tar czpf $bundle --exclude "./tmp/*" \
  --exclude ./lost+found \
  --one-file-system . 2>&1 | grep -v 'socket ignored'

I can get away with this because I build my template systems on a single filesystem to make this process a snap. The operating system goes in one filesystem and data will be mounted separately (likely with logical volume management) so the --one-file-system option will ignore it. Simply adjust the tar command if you have broken out /var, /boot, and others so that you don't miss any required files.

The actual script that I use to perform this process does more than just this, but not all that much. Besides spending time getting options, printing usage information, and providing a means to update itself, it transfers the resulting tar file up to a web service that I have running on a host serving as a bundle repository. That way, my bundles get "published" and can be used across the hosting environment. Note that one large side benefit of the script's simplicity is that bundles can be used for both Xen and KVM virtual machines...it will be up to the provisioning process to make the minor adjustments to account for hypervisor differences.

VM Build Process

As I just mentioned, the bundles I create can be used with both Xen and KVM. When using libvirt as a middleman between configuration and hypervisor there are very few differences in the provisioning processes. I'm pretty anal so I do have separate programs to do both Xen and KVM, but that's by no means required and I probably will just combine the two at some point.

One of the differences between Xen and KVM is that Xen has the ability to assign logical volumes as individual disk partitions in the guest VM. For example, /dev/vg_xen/testvm-root can be presented to the VM named testvm as /dev/xvda1. This is great, because a filesystem can be created on that volume on the Xen host, quickly mounted up, and then the bundle can be untarred directly into it. When the VM boots, it'll see that filesystem as /dev/xvda1, its root filesystem.

From my research and testing, KVM does not have this ability. Instead, logical volumes can only be presented as whole disks, so /dev/vg_kvm/testvm-vda could only be mounted as /dev/vda in the VM. This presents the (minor) annoyance of having to partition the disk from the KVM host first and then playing the kpartx game before unbundling. I'll do that in the example below since this method could also be used for Xen.

Ok, with that out of the way, let's get down to it. I find that it's easiest to start with a list of all the information I'll need for a fully functional VM and then go from there:

  • VM name (obviously)
  • MAC address: Not absolutely required but I like to assign these programmatically.
  • IP address: Again, not absolutely required since you could just use DHCP, but I like to assign these and possibly push the assignments to DHCP.
  • Bridge: You'll need to know what bridge to use for your VM's network.
  • Disk Device: This piece can be huge ... it can be something as simple as a file path or it can become a big process to generate an iscsi lun on a remote server and connect to it from the provisioning host. In this example, I'll use a local logical volume group. In future blog entries I'll probably talk a lot about iSCSI.
  • CPU count
  • Memory

For the purposes of this blog entry, I'm going to chop the stuff that isn't necessarily all that interesting: mac/ip/bridge assignment and cpu/memory definition. So for simplicity I'll take these items as variables used by a bash function named vm_build. I'll just spit out the function (from the KVM build process) in chunks and discuss along the way. Where my "real" scripts do more I'll try to make a comment.

vm_build()
{
  vm=$1

  ### do name validation in real script

  echo
  echo "=== Creating new linux domain: $vm ==="

  echo Bundle: $bundle
  echo Memory: $mem
  echo CPUs: $cpu
  echo MAC: $mac
  echo IP:  $ip
  echo Bridge:  $bridge
  echo Volume Group:  $vg
  echo Root Filesystem Size: $size_root
  echo Swap Size: $size_swap
  echo Filesystem Type: $fs
  echo Boot on Completion: $boot

  xml=/etc/libvirt/qemu/${vm}.xml
  sed "s/\$NAME/$vm/" $BASE_XML | \
  sed "s/\$MEM/$mem/" | \
  sed "s/\$CPU/$cpu/" | \
  sed "s/\$IP/$ip/" | \
  sed "s/\$BRIDGE/$bridge/" | \
  sed "s/\$MAC/$mac/" > $xml

This is just displaying the variables that I'll have defined by the time it's ready to build a VM. You could take this as command-line options or have functions to pick them or, like I do, use a mixture of both. I generally take bundle, memory, cpu, size_root, size_swap, and fs as command-line options since they're a unique decision for each VM. For the network info and disk volume, that gets programmatically generated since it would be annoying to manually create unique MACs and manually pick open IP addresses. :)

Once those variables are in place, start feeding them into a template libvirt xml file (defined as $BASE_XML) which we'll use to define the VM. Here's a link to this template:

base_kvm_linux.xml

Now to the dirty part ... the actual disk provisioning.

  echo creating logical volumes...

  lv_size=$((size_root + size_swap))

  lv_root=/dev/$vg/$vm
  lvcreate -L ${lv_size}m -n $lv_root
  if [ $? -ne 0 ]; then
    echo "error: could not create root volume for $vm...aborting!"
    return 3
  fi
  sed -i "s@\$ROOT@$lv_root@" $xml

  parted -s $lv_root mktable msdos
  parted -s -- $lv_root mkpart primary ext2 0 $size_root
  parted -s -- $lv_root mkpart primary linux-swap $size_root -1
  parted -s -- $lv_root set 1 boot on

  # brutal hack workaround since parted creates weird mapper entries
  mapper_name=`echo $vm | sed 's/-/--/g'`
  dmsetup remove /dev/mapper/*${mapper_name}*p1
  dmsetup remove /dev/mapper/*${mapper_name}*p2

  kpartx -a -p P $lv_root
  part_root=/dev/mapper/${vm}P1
  part_swap=/dev/mapper/${vm}P2

  mkswap $part_swap

  echo building root filesystem...

  mkfs.$fs -q $part_root
  mkdir -p /mnt/${vm}
  mount $part_root /mnt/${vm}

  echo extracting operating system...
  tar -C /mnt/$vm -xzf $bundle
  sync # gotta sync otherwise the grub will occasionally fail???

The above allocates a logical volume, partitions it, plays device mapper games, mounts it locally on the KVM host, creates a filesystem (type specified by $fs - I use ext3 and ext4 as needed) on one partition and mounts it, makes swap on the other, and finally extracts the bundle. It's a lot of work for really not all that much activity. As I mentioned before, with Xen this *can* be simplified to create two logical volumes: one for root and one for swap. The mkfs will run on the root logical volume and that volume will be mounted. It eliminates all the partitioning steps and device mapper complexity.

From here, we go to the local modifications. For simplicity's sake again, I'll just assume that we only build Red Hat based systems. Debian would have its configuration files in other locations.

  # update the hostname
  sed -i "s/HOSTNAME=.*/HOSTNAME=${vm}.${dns_domain}/" \
    /mnt/$vm/etc/sysconfig/network

  # change the grub boot device
  sed -i 's@root=[^ ]\{1,\} @root=/dev/vda1 @' \
    /mnt/$vm/boot/grub/grub.conf

  # create a base fstab using the virtio vda devices
  cat >/mnt/$vm/etc/fstab <<+++
/dev/vda1 / $fs defaults,noatime 1 1
/dev/vda2 swap swap defaults 0 0
tmpfs  /dev/shm tmpfs   defaults       0 0
devpts /dev/pts devpts  gid=5,mode=620 0 0
sysfs  /sys     sysfs   defaults       0 0
proc   /proc    proc    defaults       0 0
+++

  # clear out the MAC address
  sed -i '/^HWADDR/d' \
    /mnt/$vm/etc/sysconfig/network-scripts/ifcfg-eth0
  sed -i "s/^IPADDR=.*/IPADDR=$ip/" \
    /mnt/$vm/etc/sysconfig/network-scripts/ifcfg-eth0

  # for now, just remove persistent rules files
  rm -f /mnt/$vm/etc/udev/rules.d/70-persistent-net.rules

  # disable selinux
  if [ -f /mnt/$vm/etc/sysconfig/selinux ]; then
    echo 'SELINUX=disabled' > /mnt/$vm/etc/sysconfig/selinux
  fi

Mostly it's just a matter of making sure that the system boots up on the device partitions that KVM is presenting (Xen will use xvda rather than vda) and that our eth0 network interface comes up with its own MAC address and IP. SELinux is disabled since it will hose stuff up in the event that our template VM had it enabled.

Now we need to make sure that the new VM will actually boot with grub. In Xen this is not at all required since it uses the beautiful concept of pygrub, but as far as I am aware KVM has no such ability and it requires that a boot loader be installed on each VM's root volume.

  mount --bind /dev /mnt/$vm/dev
  mount -t proc none /mnt/$vm/proc
  
  cat >/mnt/$vm/vm-grub <<EOF
#!/bin/bash
ln -s $part_root ${lv_root}1
grub <<+++
device (hd0) $lv_root
root (hd0,0)
setup (hd0)
+++
rm -f ${lv_root}1
EOF

  chmod 755 /mnt/$vm/vm-grub
  echo chroot /mnt/$vm /vm-grub
  chroot /mnt/$vm /vm-grub

  umount /mnt/$vm/proc
  umount /mnt/$vm/dev
  umount /mnt/$vm
  rm -rf /mnt/$vm

  kpartx -d -p P $lv_root

If there is a better simpler way to do this, please, I am all ears. I struggled through this piece the longest in the conversion of my Xen provisioning scripts to KVM and would really love to clean this up. I mean, I love taking any chance I can to use chroot and all, but one command sure would be nice. :)

Anyway, finally, we get to the definition of the VM and its registration with DNS.

  echo -n Defining VM...
  virsh define $xml &>/dev/null
  if [ $? -eq 0 ]; then
    echo "success!"
    echo Configuring VM to autostart...
    virsh autostart $vm
    if [ "$boot" -eq 1 ]; then
      virsh start $vm
    fi
  else
    echo "failure!"
  fi

  ### call DNS registration in the real script

  echo
  echo "=== Creation of $vm complete ==="
}

And that's it! Not so bad, eh? :) The VM is built and started up if desired. When using ext4 and a compressed bundle of 400MB or so, the fastest I've provisioned a machine on 10k RPM disks was 14 seconds. I'd love to bring this number down to single digits so perhaps a nice RAID of 15k RPM disks is in my future.

I hope this proves useful and I'd absolutely love to receive any and all feedback. Until next time!