Tuesday, October 18, 2011

Too much redundancy will kill you

A customer asked me to verify their vSphere implementation. Everything looked perfectly redundant, in the traditional elegant way: cross over between layers to avoid single points of failure. I had to break the bad news: too much redundancy can mean NO redundancy.
In this case: host has 4 network interfaces (2x dual port card). VM's connect to a vSwitch, which has redundancy over vmnic0 and vmnic2 (using 1 port of each card). Another vSwitch for the storage traffic, same level of redundancy, using vmnic1 and vmnic3. Looking good.
Then the physical level. 4 host interfaces, 2 interconnected network switches. The traditional |X| design connects the two interfaces of every card to different switches. Looking good.

But looking at both configurations together, you'll see that every vSwitch gets connected to one physical switch. The sum of two crossed redundancy configurations equals no redundancy at all.
Enabling CDP or LLDP can help you identify this problem, as you can identify on every interface which physical switch it connects to. In this case the CDP physical switch identifier was the same on vmnic0 and vmnic2, and again the same on vmnic1 and vmnic3.
I advised changing the cabling to four straight || || connections, vmnic0 and vmnic1 to the left switch and vmnic2 and vmnic3 to the right switch. That re-introduces the redundancy they thought they had.


pvaneynd said...

That is actually quite funny.

deinoscloud said...

Nice catch. Many designs have this kind of error...

Obviously you would have to change at the vSS level if you hosts are blade type since the X is physically wired in the back plane of the chassis; NICPort1->SW1, NICPort2->SW2