Friday, August 8, 2008

don't forget mkinitrd

Most Linux system-administrators are well aware of the benefits of LVM. With online resizing of filesystems and migration of data from one disk to another, it's fantastic. But don't assume that your system will do everything for you.
If your system has one disk, that you used as a physical volume (PV) in a volume group (VG), where your root partition is stored as a logical volume (LV), you can easily add a new disk. Add the disk physically, boot up, use pvcreate and vgextend to include the new disk in the existing volume group. Just don't reboot, at least, not yet !

Your system requires a manual rebuild of its initial ram disk
/sbin/mkinitrd -f /boot/initrd-2.6.18-92.1.10.el5.img 2.6.18-92.1.10.el5

If you forget to do this, your system will not boot, because it won't find all the components necessary to activate the VG it needs to access the root filesystem. Symptoms: kernel loads, initrd loads, root filesystem can't be mounted because the volume group doesn't exist.
If it happened already (and that's why google sent you to this blog entry), don't despair, it's an easy thing to fix with the "linux rescue" option of your RHEL/CentOS installation CD or DVD.

how many physical hosts do you buy: what MS sales didn't tell you

One of the first steps in a virtualization project is building a list of workloads that will get virtualized, with a measurement or estimate of the resources that they will need. X MHz and Y MB, sum everything up, and let's say you get 30 GHz of CPU power and 20 GB of RAM.

The hardware you'd like to run all those virtual machines on can handle two CPUs (dual-socket), four cores each (quad-core). That means that every physical server will give you between 20 and 25 GHz of CPU power. For memory, you'll buy 12 GB of RAM in each server.

So the plan is to buy two of those servers, right ?

Well, as long as your infrastructure is 100% healthy and running OK, two servers will do the job just fine. You've got enough resources, with a bit of headroom for overhead and future growth. But what happens when one of the physical servers is down ? Think of hardware problems, think of virtualization software upgrades, think of patching the hypervisor.

Then the available resources are down to 20 GHz and 12 GB of RAM. For CPU, 20 GHz means that every application will get 30% less than desired, and will therefore run a bit slow, probably noticable for users. Is that acceptable in these cases ?

And last but not least, memory. Temporarily, you've got just 12 GB of RAM available, and your VMs need 20 GB. Did you know that what happens depends on which hypervisor you've chosen ?

  1. With Hyper-V or Xen, you're in trouble. With 12 GB, you can run 60% of your VMs, and the rest stays down.
  2. With ESX and ESXi, you can start all your VMs, and just as with CPU, there's not enough resources so everything will slow down a bit. But it will run. This trick is called "memory overcommitment".
Conclusion: if you chose Hyper-V or Xen, you'll need three of those servers to continue running your business. With VMware, you have the option of buying three and continuing without speed impact, or you just buy two servers, and live with the temporary loss of performance.

Now let me guess, did your Microsoft sales guy tell you about this difference ?

Monday, August 4, 2008

are you still using RHEL 2.1 ?

Are you still using Red Hat Enterprise Linux 2.1, or CentOS 2.1 ? Then this news is of great importance to you: planned End-Of-Life for these products is approaching ! After May 2009, there will be no more security updates for RHEL 2.1, nor support from Red Hat.

It's not really news, as the lifecycle of RHEL products has always been clearly announced and published. If you missed all that, this is the time to start planning an upgrade. Your Red Hat subscription gives you the right to use the newer versions of RHEL, so upgrading is all you need to do.

Need help planning an upgrade to RHEL 3, 4 or 5 ? I can recommend some experienced consultants ! ;-)

Friday, August 1, 2008

choosing hardware for an ESX testlab

When you're shopping for ESX servers to build your next production cluster, you know where to look. The hardware compatibility lists at are frequently updated, and contain everything you need to know. But what about test labs ? When you don't care too much about "is it supported" but rather ask yourself "does it work" and "can I buy something cheaper" ?

I asked myself the same questions when I built my own testlab. And this was my choice: 2 identical PCs, equipped with
  • MSI MS7345 motherboard
  • Q6600 intel CPU
  • 8 GB of RAM (four times 2 GB)
  • PCI Promise SATA 300 TX2plus
  • 80 GB sata harddrive (smallest I could get)
  • a dual port Intel gigabit card
Add a third PC, similar but with more diskspace, running CentOS 5 with IET as an iSCSI server, and you get a 19 GHz, 16 GB VI3 cluster with several hundred GB storage space. Perfect for testing, and without losing an arm and a leg paying for them.

how do I enable EVC when VC is running in a VM ?

So you're running the new ESX 3.5 update 2. And you want to try the new EVC feature on your cluster. And you've found it under "edit settings" on your cluster object...
And then VirtualCenter refuses to enable it for you, because there is still at least one VM powered on in the cluster. Of course there is, because your VirtualCenter runs in a VM, in that same cluster. Catch-22 ?

Well, getting round this is a bit weird, but nevertheless possible. Here's what I did on my two-node test cluster:
Step one: you evacuate one ESX host in your cluster. You put it in maintenance mode, and move it out of the cluster, right under your datacenter object.
Two: manually migrate (VMotion via drag-and-drop for example) your VirtualCenter VM (running on a host in the cluster) to the host that is now outside the cluster. Do this with all other VMs that were still running in the cluster.
Three: enable EVC on your cluster. This now works, because the cluster doesn't contain any running VMs anymore.
Four: migrate the VMs back to a host within the cluster (again, drag and drop)
And finally: put the host back into the cluster.