Skip to main content


Showing posts from 2011

Too much redundancy will kill you

A customer asked me to verify their vSphere implementation. Everything looked perfectly redundant, in the traditional elegant way: cross over between layers to avoid single points of failure. I had to break the bad news: too much redundancy can mean NO redundancy.
In this case: host has 4 network interfaces (2x dual port card). VM's connect to a vSwitch, which has redundancy over vmnic0 and vmnic2 (using 1 port of each card). Another vSwitch for the storage traffic, same level of redundancy, using vmnic1 and vmnic3. Looking good.
Then the physical level. 4 host interfaces, 2 interconnected network switches. The traditional |X| design connects the two interfaces of every card to different switches. Looking good.
But looking at both configurations together, you'll see that every vSwitch gets connected to one physical switch. The sum of two crossed redundancy configurations equals no redundancy at all.
Enabling CDP or LLDP can help you identify this problem, as you can identify o…

vCenter Appliance and underscores in hostnames

Found out the hard way: don't use underscores in hostnames. It's not allowed by DNS, and it breaks things. In this case: joining vCenter Server Appliance (VCSA) in an Active Directory doesn't work if the hostname of the appliance contains an underscore (_). It also doesn't work if the hostname is "localhost".
If your appliance uses DHCP, the appliance gets its hostname through reverse DNS. So in that case, it _is_ a freaking DNS problem.

vSphere5 nested virtualization as seen in /proc/cpuinfo

I won't blog about the whole vhv.allow="true" procedure here, that's been covered elsewhere. But what does nested virtualization change in a VM ? Well, the CPU features that are exposed change:
A regular 64-bit Linux VM sees
# grep flags /proc/cpuinfo
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc up arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx hypervisor lahf_lm ida arat
A 64-bit VM with nested virtualization enabled sees
# grep flags /proc/cpuinfo
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc up arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx hypervis…

SSH cipher speed

When setting up backups over SSH (e.g. rsnapshot with rsync over SSH), it's important to know that the default SSH cipher isn't necessarily the fastest one. In this case, the CPU-based encryption is the performance bottleneck, and making it faster means getting faster backups.
A test (copying a 440 MB file between a fast Xeon CPU (fast=no bottleneck there) and an Atom based NAS) shows that the arcfour family of ciphers are clearly the fastest in this setup:
cipherreal timeuser timebandwidtharcfour0m9.639s0m7.423s45.7 MB/sarcfour1280m9.751s0m7.483s45.1 MB/sarcfour2560m9.856s0m7.764s44.7 MB/sblowfish-cbc0m13.093s0m10.909s33.6 MB/saes128-cbc0m22.565s0m20.129s19.5 MB/saes128-ctr0m25.400s0m22.951s17.3 MB/saes192-ctr0m28.047s0m25.771s15.7 MB/s3des-cbc0m51.067s0m48.018s8.6 MB/s
The default configuration of openssh uses aes128-ctr, so changing the cipher to arcfour gets me a 2.5-fold increase in bandwidth here ! Use the "Ciphers" keyword in .ssh/config or the "-c" …

Dell's R210-II as vSphere home lab server

My VI3 and vSphere4 home lab consisted of whitebox PCs. For VI3 I used MSI based nonames, for vSphere4 I used Shuttle SX58j3. For the new vSphere5 generation, I wanted some real server hardware. Because of shallow depth requirements, the choice of rackmount servers was limited. I picked the Dell Poweredge R210II instead of the sx58j3 because
- on the vSphere HCL (the sx58j3's won't boot vSphere5 RC !)
- Sandy Bridge low TDP CPUs available (I got the E3-1270)
- onboard dual BCM5716 nics support iSCSI offload (aka "dependent HW iSCSI")
- IPMI built-in (not tested yet)
- dense: 1U (the sx58j3 is about 4 units, but can fit 2 in 19")
- one free PCIe slot (The sx58j3 has 2 slots, but needs a VGA card)
- not incredibly expensive (up to 16GB RAM)
- only one free PCIe slot (max GbE nics needs expensive quadport card)
- incredibly expensive (with 32GB RAM it's 3x the price of a 16GB config)
- can't buy without at least one disk. I'll be running from…

HTTPS SSL stops working because of old libraries

At a customer, a Linux workstation suddenly refused to open HTTPS sites. Verified recent package versions of both browser (konqueror) and libraries (kde, openssl), everything looked good, but it didn't work. This blogpost serves as documentation for the fact that checking new software isn't enough, because in this case removing old openssl compatibility libraries solved the problem. The kio_http helper is not linked with openssl directly, and for some reason it must have tried to open one of the old openssl versions that were also installed. After erasing all versions between 0.9.5a and 0.9.6b, keeping the current 0.9.8e, konqueror had no problems opening https sites anymore.

Home lab switch

My home lab got upgraded with a new gigabit switch recently. Main improvement I wanted over the old Linksys SLM2024 I had: Cisco Discovery Protocol.
Based on that requirement and the budget, I selected the Cisco SG300-28 Small Business managed switch. The web interface is clearly improved compared to the SLM2024, and CDP is a real treat. Both vSphere ESXi and the cdpr utility under Linux decode the CDP information nicely. CDP is a great help to find errors in patch cable arrangement !

Logitech diNovo Mini keyboard lacks F-keys

I thought the Logitech diNovo Mini keyboard would be a perfect keyboard to keep in my basement rack for occasional maintenance activities on my Linux and vSphere servers. Turns out the diNovo Mini lacks F keys. Not even Fn-[number] will send the correct keycode. What a disappointment. The larger (but still small) diNovo Edge has function keys, but is far less suited to be left in a dusty environment like a basement rack.
Does anyone else know of a better solution ?

Weird vmnic numbering

After installing new Intel quad port ethernet cards in vSphere ESXi machines, I had to figure out which physical port matched to which vmnic number. Strange though it may sound, the mapping turned out to be (top to bottom as seen on the back of the card).

A: vmnic2
B: vmnic3
C: vmnic0
D: vmnic1

However, the PCI layout of most quad port cards makes this easier to understand: a quad port card is implemented as two dual port cards behind a PCI bridge chip. While enumerating the PCI bus, the VMkernel can find one bus first, enumerate the devices on it, then find the second bus, and enumerate the devices there.
In this case, the bottom bus was found first, and vmnic's on it were counted top to bottom (vmnic0 and vmnic1). Then the top bus was found, and again vmnic's on it were counted top to bottom (vmnic2 and vmnic3).

When marketing and technical information meet: Hyper-V

While reading an article about Hyper-V per-VM CPU settings, I saw this in the FAQ:

[BEGIN QUOTE]Why do you use percentage for the limit and reserve – and not MHz / GHz?Many people find it easier to think in MHz / GHz rather than percentage of a physical computer. They also argue that using a percentage means that as you move a virtual machine from computer to computer you may get different amounts of resource depending on the underlying capability.This is something that has been discussed extensively on the Hyper-V team, and while I do believe there is some merit in this approach, there are a number of reasons why we chose to use a percentage instead. Two key ones are:Predictable mobility

If all your virtual machines have a reserve of 10% – you know that you can run 10 of them on any of your servers. The same would not be true if they all had a reserve of 250Mhz. Given how important virtual machine mobility is to our users – we believe that this is something that needs to be easy to…

Every error is a DNS error.

Newly installed RHEL5 machine in an existing network. Users opening firefox on the machine got an error "The bookmarks and history system will not be functional". The googlesphere suggested renaming places.sqlite and such, but that didn't help. Things began to clear up when I found errors on the NFS server that exports the home directory: "lockd: failed to monitor newmachine.companydomain". I checked the nfslock service, but it was running fine. Configuration files for NFS and autofs were identical to other machines that didn't show the problem. Then, like a bolt of lightning, it hit me: I had forgotten to create a reverse DNS entry for the new machines IP. Forward DNS was OK, but reverse wasn't. That caused the NFS lock error, and that caused the firefox error... The old saying is confirmed once more: every error is a DNS error.

Link aggregation and VLANs on QNAP with firmware 3.4.0

The new QNAP firmware (3.4.0) supports 802.1q VLAN tagging, but you can't create multiple interfaces in different VLANs on the same physical interface through the webinterface.In the case of link aggregation (LACP 802.3ad for example), that means only 1 VLAN and 1 IP address can be used. Fortunately, QNAP allows full access to the underlying Linux system. Adding a VLAN interface goes like this (the example uses VLAN 234)# /usr/local/bin/vconfig add bond0 234 # ifconfig bond0.234 broadcast netmask
of course, this change is not permanent, a reboot will not automatically start this interface. I'll blog about making it permanent later.

software RAID on old vs. new CPUs

The Linux kernel has several software RAID algorithms, and selects the one that is fastest on your CPU. Isn't that always the same algorithm then ? No, definitely not. Newer CPUs have additional instructions that help speed things up. And it's not just clock speed that matters, memory bandwidth plays an important role too.

On an old Pentium II Xeon 450 MHz, raid5 uses p5_mmx, and raid6 uses mmxx2. Software raid6 calculations are 72% slower than raid5.On a Pentium IV Xeon 1.5 GHz, raid5 using pIII_sse, and raid6 uses sse2x2. Software raid6 calculations are 12% slower than raid5.On an AMD Athlon XP2000+ (1.6 GHz), raid5 uses pIII_sse, raid6 uses sse1x2. Software raid6 calculations are 42% faster than raid5.On 64-bit systems, no relevant instructions are different between generations so far:
On a AMD Athlon64 XP3400 (2.4 GHz), raid5 uses generic_sse, raid6 uses sse2x4 (raid6 44% slower than raid5).On a Xeon 5160 3GHz, raid5 uses generic_sse, raid6 uses sse2x4 (raid6 15% slower than…