KVM vs. XenServer vs. VMware Memory Overcommitment

KVM claims to support 3 different kinds of memory overcommitment (I wouldn’t count live migration as memory management). My comparative analysis of these features to its competitors, based on what has been written about them (I don’t have any direct experience with KVM, and have not played with XenServer 5.6′s balloon driver so far):

Swapping

Since KVM-based VMs run as processes a large amount of swap configured on the hard server will allow pages in the VMs to get swapped out. This seems inefficient since pages in the VM which are already file-backed may get paged to swap and be doubly-committed to disk, but should be usable in pre-production environments with many idle and completely unused VMs.

As far as I know VMware does not support this operation. This mechanism should be highly stable in KVM since it leverages the VM-as-process model and the underlying code has been debugged for literally decades.

It looks like the community Xen 4.0 hypervisor supports swap-to-disk for HVM based guests and not PV guests, but XenServer 5.6 does not yet support this.

Page Sharing or Memory De-Duplication

VMware patented this process. The base Linux kernel in 2.6.32 has added a similar feature in KSM, and the ksmd daemon, which runs in user-space and can de-duplicate memory across different processes. As KVM VMs run as processes, KVM immediately benefits from this.

Again it looks like the community Xen 4.0 hypervisor supports KSM for HVM guests, but not PVs and this feature is not present in XenServer 5.6.

Since KSM is relatively new code in the Linux kernel, it is going to be less tested than VMware’s implementation. The KSM code also uses a slightly different algorithm than VMware and avoids the use of hashes to do comparison of pages, which may impact performance (conceivably it could impact performance positively, since it avoids the computationally heavier hash in place of just doing memcmp).

Ballooning

VMware has supported dynamic ballooning for a long time. XenServer 5.6 has recently added dynamic ballooning, although the balloon driver in the Xen hypervisor has been present since at least before 2005. The only new addition to XenServer 5.6 has been the addition of the xenballoond guest daemon to dynamically tweak memory ballooning.

KVM also has memory ballooning and memory ballooning has been back-ported to the 2.6.18-194.el5 RHEL/CentOS 5.5 kernel. So far I can’t find any information on support for automatic dynamic balancing of memory ballooning the way that VMware or XenServer 5.6 does.

Scorecard

VMware is clearly mature and just works and the lack of swapping VMs out to swap pages on the host is probably not a necessary feature for VMware given that memory de-duplication and ballooning work reasonably well under VMware.

XenServer 5.6 relies entirely on a newly-introduced balloon driver and I expect that under heavier workloads that there could be some performance instability discovered in xenballoond under edge conditions. XenServer seems to be lagging behind. While the Xen community hypervisor looks to be able to easily leverage KSM swapping and page de-duplication on HVM guests, the architecture of Xen PVs means that any swapping or page de-duplication needs to be patched directly into the underlying Xen hypervisor.

KVM benefits from its design in being able to leverage kernel swapping of VM pages, and leveraging KSM page de-duplication. Swapping VM pages to disk should be remarkably stable, and should be useful to massively overcommit memory in pre-production environments and compete with VMware’s balloon driver. The KSM page de-duplication is not yet mature code, but inclusion in the vanilla Linux kernel should rapidly increase the maturity of the codebase and allow it to compete with VMware’s page de-duplication. Once KSM and the KVM balloon driver matures, it should put KVM on equal footing with VMware.

  • Share/Bookmark

Xenserver 5.6 extending disk on a Linux VM

Three things need to happen to increase the disk available to a Linux VM:

  1. Increase the disk size available to a virt in XenCenter
  2. Increase the size of the VM partition in the VM’s partition table
  3. Grow the size of the ext3 filesystem in the VM

First of all the VM needs to be shut down in order to extend the size of the drive in XenCenter (I don’t know how to do this on-the-fly).

On the storage tab of the VM, click on properties and increase the size of the storage in XenCenter

Increase the size of the linux / partition using fdisk. The utility parted should allow you to resize partitions as well and be much smarter about it and let you move filesystem data around as well, but parted does not allow you to edit running partitions. In this case since I want to edit /dev/xvda3 which is my / partition and it is at the end of the virtual disk I only need to extend the size of the partition and not move it around. On the running VM use fdisk to increase the partition:

# fdisk /dev/xvda

The number of cylinders for this disk is set to 5221.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/xvda: 42.9 GB, 42949672960 bytes
255 heads, 63 sectors/track, 5221 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

    Device Boot      Start         End      Blocks   Id  System
/dev/xvda1   *           1          38      305203+  83  Linux
/dev/xvda2              39         103      522112+  82  Linux swap / Solaris
/dev/xvda3             104        4177    32724405   83  Linux

Command (m for help): d
Partition number (1-4): 3

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 3
First cylinder (104-5221, default 104): 104
Last cylinder or +size or +sizeM or +sizeK (104-5221, default 5221): 5221

Command (m for help): p

Disk /dev/xvda: 42.9 GB, 42949672960 bytes
255 heads, 63 sectors/track, 5221 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

    Device Boot      Start         End      Blocks   Id  System
/dev/xvda1   *           1          38      305203+  83  Linux
/dev/xvda2              39         103      522112+  82  Linux swap / Solaris
/dev/xvda3             104        5221    41110335   83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.
#

Now the last warning is accurate, you need to reboot the VM again in order for the kernel to pickup the changes to the partition geometry. Once the VM has rebooted, you can run resize2fs in the running VM which will automatically notice the difference between the partition geometry and the size of the filesystem and will extend the filesystem to its maximum size:

# resize2fs /dev/xvda3
resize2fs 1.39 (29-May-2006)
Filesystem at /dev/xvda3 is mounted on /; on-line resizing required
Performing an on-line resize of /dev/xvda3 to 10277583 (4k) blocks.
The filesystem on /dev/xvda3 is now 10277583 blocks long.

The need to reboot the VM twice here is a little bit annoying, there should be a better way to do this on-the-fly. It should also be straightforwards to replace the use of the GUI with xe commands, and if you’re using ext3 volumes in XenServer and have access to the VHD files, it should be possible to do the partitioning and filesystem resizing from the dom0 as well (although doing this on-the-fly would require some kind of co-operation with the VM kernel, and probably requires the VM to be shut down anyway).

  • Share/Bookmark

XenServer 5.6 thin provisioning with ext3

XenServer 5.6 allows admins a choice between 3 different kinds of volume management: LVM, LVHD or ext3. With the default in XenServer 5.6 of LVHD you gain quick snapshots and have thin provisioning of snapshots and suspended virtual machines, but running virtual machines have 100% of their disk allocation counted against the disk usage. In order to get thin provisioning of running VMs you need to build/rebuild your SRs as ext3 volumes. You lose rapid snapshots in the process. I also am not sure that this meets everyone’s definition of “thin provisioning” since this is just lazy allocation of blocks on ext3. If you fill up the disk on the VM and then delete a large amount of space, I don’t believe you will see the disk usage affected on your virtual machine. Still, with most server images in the Enterprise being nearly un-utilized, this should still be effective — particularly if you are good about log rotation and don’t let your partitions fill up.

In order to convert the default local storage volume on a XenServer 5.6 host you need to use the console xe utilities to destroy and recreate the SR. This is destructive to VMs on the host, so these instructions assume a newly build XenServer 5.6 — the adaption to adding a new drive to a host and creating a new SR with ext3 using ‘xe sr-create’ with these arguments is also straight forwards. If you’ve already got VMs on the SR you’ll need to migrate them off and migrate them back one way or another. Don’t try this for the first time on a VM host that you care about, particularly if you aren’t skilled with the command line.

First there’s a default template in XenServer 5.6 which needs to be removed from the storage:

# xe vbd-list
uuid ( RO)             : f5c9f545-2019-7299-be87-fc7ef00be1e2
          vm-uuid ( RO): e2ad0921-dea8-5a1a-77e8-d3257fdcf48d
    vm-name-label ( RO): XenServer Transfer VM 5.6.0-31124p
         vdi-uuid ( RO): c3a8d327-2036-4ce2-9946-f0522f7572f4
            empty ( RO): false
           device ( RO):
# xe template-uninstall template-uuid=e2ad0921-dea8-5a1a-77e8-d3257fdcf48d
The following items are about to be destroyed
VM : e2ad0921-dea8-5a1a-77e8-d3257fdcf48d (XenServer Transfer VM 5.6.0-31124p)
VDI: c3a8d327-2036-4ce2-9946-f0522f7572f4 (XenServer Transfer VM system disk)
Type 'yes' to continue
yes
All objects destroyed

If you really needed that template, you don’t have it anymore. You’ll have to figure out how to get it back. I’m not sure what the purpose of that is for. It is by default installed on all new XenServer 5.6 images, so you should be able to export it from a fresh install and re-import it to fix, but I’m not going to offer instructions on how to do that, and haven’t tested it.

Next, find the uuid of the Local Storage SR:

# xe sr-list name-label="Local storage"
uuid ( RO)                : dacfea90-263e-0811-ab88-22f01b89b1b4
          name-label ( RW): Local storage
    name-description ( RW):
                host ( RO): vmhost.example.com
                type ( RO): lvm
        content-type ( RO): user

Then find the PBD that is attached to that:

]# xe pbd-list sr-uuid=dacfea90-263e-0811-ab88-22f01b89b1b4
uuid ( RO)                  : daabdf71-641c-900b-3451-bd5c70675fab
             host-uuid ( RO): 23d8a9a0-a317-47a5-a1e6-858ab120b57b
               sr-uuid ( RO): dacfea90-263e-0811-ab88-22f01b89b1b4
         device-config (MRO): device: /dev/disk/by-id/scsi-36001c230bd1017000e4f2ee6554b21c8-part3
    currently-attached ( RO): true

Then unplug the PBD:

# xe pbd-unplug uuid=daabdf71-641c-900b-3451-bd5c70675fab

Now destroy the SR:

# xe sr-destroy uuid=dacfea90-263e-0811-ab88-22f01b89b1b4

Now you can create the SR. I’ve been using servers that have /dev/sda, so the storage partition is /dev/sda3. If you’re doing this on a SATA system (ick) you might have to use /dev/hda3 here, or on an HP probably /dev/cciss/c0d0p3. If you have FibreChannel or iSCSI-attached disk on a SAN you’re on your own to figure out what your block device is.

# xe sr-create content-type=user type=ext device-config:device=/dev/sda3 shared=false name-label="Local storage"
76ec3072-ae85-cd38-e363-34cf6b63d520

This command will take some time to return as it creates the SR.

You now probably want to tune the reserved space down on the ext3 partition to make more of it available. The filesystem reserves 5% of the storage to make block allocation and defragmentation more efficient, but you probably want to manage that yourself (set monitoring alarms at 95% and migrate VMs off if the storage gets above 95%).

The block device to tune is not /dev/sda3, but you can find it from df -k:

# df -k
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda1              4128448   3214896    703840  83% /
none                    384512         0    384512   0% /dev/shm
/opt/xensource/packages/iso/XenCenter.iso
                         44410     44410         0 100% /var/xen/xc-install
/dev/mapper/XSLocalEXT--76ec3072--ae85--cd38--e363--34cf6b63d520-76ec3072--ae85--cd38--e363--34cf6b63d520
                     279556112    191652 265163836   1% /var/run/sr-mount/76ec3072-ae85-cd38-e363-34cf6b63d520

Use tune2fs against that really ugly block device name to set the reserve to 0%:

# tune2fs -m 0 /dev/mapper/XSLocalEXT--76ec3072--ae85--cd38--e363--34cf6b63d520-76ec3072--ae85--cd38--e363--34cf6b63d520
tune2fs 1.39 (29-May-2006)
Setting reserved blocks percentage to 0% (0 blocks)

You should now be able to see the new “Local storage” device in XenCenter and can set it as the default storage location for new VMs. You will also see VHDs associated with your VMs showing up in the /var/run/sr-mount/[...etc...] directory.

  • Share/Bookmark

GPO at the Warhawk

Went out with an armada of divers that hit Discovery Bay last weekend.

My strobe showed up in the mail today, but I didn’t have it for this dive, so the flash shadow was annoying me again today, but this shot is pretty nice.

  • Share/Bookmark

Wolfie at Alki Pipeline

A couple of shots of a wolfie at Alki Pipeline. The lack of a diffuser on the S90 housing is causing the right-hand-side shadow. The AF35 strobe that I’m waiting to get right now should solve that problem.

  • Share/Bookmark

Cisco IOS Router Setup

I’ve been a Unix SA/SE for about 16 years and my hands-on knowledge of IOS has always been limited due to limited console time on Cisco routers. However, now I’m studying to get a CCNA. Certificates are kinda lame, but I’ve run into times when it would be useful.

This is going to be a growing list of all the global configuration commands that I come up with that are useful for setting up a router/switch first-time (or for enforcing policy on all routers/switches). It is going to start out fairly sparse.

Basic

hostname <routername>
ip domain-name <dns name>

Sets the hostname and domainname.

Convenience

line console 0
  logging synchronous

Sets synchronous output on the console.

Security

enable password foo
enable secret bar

Sets the enable password, only “enable secret” should be used since it encrypts the password in the config.

service password-encryption

Sets up weak password encryption to obscure passwords in router config.

line vty 0 4
  login
  password foo
  logging synchronous

Set synchronous output on the first 5 telnet vtys and sets a login password for the terminal.

banner login #

    Authorized uses only.  All activity may be monitored and reported.

#

Set a multi-line banner displayed before the password prompt for telnet.

banner motd #

    Authorized uses only.  All activity may be monitored and reported.

#

Set a multi-line banner displayed before the password prompt for telnet *and* on console login (better).

Logging

archive
  log config
    logging enable
    logging size 200
    notify syslog contenttype plaintext
    hidekeys

Sets an archive history of router configuration commands

Time

clock timezone UTC 0

Set the timezone of the router manually.

clock set 02:11:25 Feb 15 2010
clock update-calendar

This is not entered in configuration mode, and sets the software clock and then writes to the hardware clock.

ntp server 10.1.1.1
ntp server 10.1.1.2 prefer
ntp server 10.1.1.3
ntp update-calendar

Set the router to be an NTP client, and use NTP to sync the hardware clock.

DNS

ip nameserver 10.1.1.1
ip nameserver 10.1.1.2

Sets nameservers for DNS queries

ip domain-lookup

Enable DNS lookups. This may be disabled by NEs to avoid command typos from being looked up in DNS, but it globally disables DNS lookups inside commands as well.

Spanning Tree

spanning-tree mode rapid-pvst

Use Rapid-PVST by default everywhere.

SNMP
TACACS

MISC

ip subnet-zero

Allow subnet zero ip addresses.

system mtu jumbo 9000

Set jumbo frames on 3750/3560/49xx switches.

  • Share/Bookmark

Diving at Rendondo

  • Share/Bookmark

El Nino: Likely Peaking

An entirely normal El Nino SST decline should occur soon.

Year-to-Year weekly comparison of El Nino cycles

Year-to-Year weekly comparison of El Nino cycles

This is a comparison of El Nino events for the past 7 El Nino events, where there is weekly El Nino 3.4 data that is available:

http://www.cpc.noaa.gov/data/indices/wksst.for

All that this picture intends to show is that this El Nino is running stronger than 4 out of the past 6 El Nino’s and every single one of the prior El Nino’s had peaked by this time, so it is unlikely that the current one would strengthen.  It is also likely that the current El Nino will start to cool.  It would not be too surprising if roughly 3 months from now there is a transition to ENSO-neutral SSTs (+0.5C or cooler) in the equatorial pacific.

I fully expect to hear wails from the denialsphere at that time that any weakness in the El Nino means that global warming is wrong and we’re heading for a new ice age, when all that will be occurring is a normal transition after an El Nino.  This graph attempts to show that El Nino cycles are simply behaving normally and any decline from this peak would be normal.

The warming pulse from the El Nino will still take 4 months to arrive, as will the transition to ENSO-neutral or weak La Nina conditions, so I still expect that 2010 will set a global surface temperature record.

The graph does show that there is at least a slight chance that the El Nino might “linger” the way that the 1991 El Nino did.  It was a bit of a late bloomer, with a rapid run-up, and appears to be peaking later than the 1994/2002/2004/2006 El Nino’s and visually appears more similar to 1991.  If I had to bet, though, I’d have to bet we’re closer to +0.5C than +1.5C 3 months from now.

  • Share/Bookmark

First Post!

This is a my first ever blog posting.

UPDATE:  after fixing my httpd.conf file to fix the 404 error on trying to setup pretty permalinks, everything seems to be working again.

UPDATE:  reCAPTCHA/MailHide and askimet seem to be setup and working correctly, so this blog should be reasonably spam-free now.  reCAPTCHA is turned off for registered users, and we’ll see how that goes…

  • Share/Bookmark