Monday, March 4, 2013

nomodeset can break Xorg monitor probing

Some time ago, a CentOS install on a particular new Dell workstation needed the "nomodeset" kernel parameter to get a graphical login screen to correctly display. This was with a Radeon Firepro 2260 graphics card.
After applying the CentOS 6.3 updates, Xorg wouldn't correctly detect the LCD panel's resolution anymore. A 1680x1050 panel would get a 1280x1024 resolution. In the days of digital DVI connections, DDC probing and what not, this was unusual and surprising.
A lot of searching and testing led to the solution: the nomodeset parameter broke Xorg probing. Rebooting without the nomodeset parameter worked (no graphical problems like I had earlier), and solved the Xorg resolution probing.

Saturday, January 5, 2013

Synology RS3413xs+ tech notes

The newest addition to my home lab is a Synology RS3413xs+ NAS. While installing it, I came across a couple of details that I didn't know before buying it. So for other people thinking of buying this unit, here's what I found out:


  • If you add network interfaces in the available PCIe slot, they might be numbered _before_ the four onboard interfaces. They were in my case. So onboard 1-4 are eth2-5, and add-on interfaces 1-2 are eth0-1. 
  • the SSD cache feature only works with identical drives in the both cache slots. You can buy two 120GB SSDs, but you can't just add one 240GB SSD. Except if you configure it manually through the CLI, and want to work without Synology support. 
  • as explained in an earlier post, there's no multiple-VLAN-over-one-interface support in the GUI, but you can work around that in the CLI
  • the DSM web interface counts VLAN-tagged packets twice in its "Total Network" graph. The per-interface/per-bond counters are correct however. PS that looks like the bug I solved three years ago in dstat 0.7.0!
  • a Synology RAID group is used as an LVM volume group. Volumes and block-based iSCSI LUNs you create afterwards are implemented as LVM logical volumes. File-based iSCSI LUNs are just placed on formatted volumes like other files.
  • the SSD cache can only be used for one LVM logical volume! Read on for a manual workaround.
  • activating or deactivating the SSD cache for a volume means stopping all services temporarily.
  • both SSDs are configured as a software RAID0 volume, with 64KB segments.
  • the SSD partitions aren't aligned at all. Makes sense I guess. The regular disk partition for data is aligned at a 512MB boundary. PS the swap partition is aligned at a 128MB boundary, and the DSM root partition is aligned at 128KB.
  • Synology implements its SSD cache feature using the "flashcache" driver in Linux (the one Facebook developed). Flashcache has three caching modes (writeback, writethrough, writearound) of which Synology currently uses writearound in DSM4.1. Just like writethrough this only accelerates read performance, as is clearly indicated in Synology documentation. If you insist on having write cache as well  - with all the consequences that brings! - you could manually change this mode to writeback. Not supported ofcourse. See the flashcache doc for details on the three modes.
  • if you absolutely need SSD cache for multiple volumes, another manual tweak is possible: dividing your SSDs into multiple partitions, making different md RAID0 devices from those, and activating those as flashcache for multiple volumes.
Get info from your own Synology device using:
# fdisk -u -l /dev/sdk; fdisk -u -l /dev/sdl 
(sdk and sdl are the two SSDs in a 10-bay Synology, where sda..sdj are the 10 regular disks)
# cat /proc/mdstat
# dmsetup table cachedev_0
# dmsetup status cachedev_0
# vgdisplay -v vg1

Tuesday, January 1, 2013

Multiple VLANs on a Synology NAS

Synology, like other SOHO/SMB NAS vendors, touts VLAN functionality with their current DSM 4.1 software. However, the web interface just lets you specify one VLAN tag to use over each eth interface (or bond interface).

Manual approach

In the busybox environment that you can ssh into as root (after enabling ssh through the webinterface), there's all the tools you need to use multiple VLANs over one link (eth or bond), however:
First you insert the 802.1q module into the Linux kernel:
 /sbin/lsmod | /bin/grep -q 8021q || /sbin/insmod /lib/modules/8021q.ko
Then you add each VLAN you need to every interface (bond0 in this example)
 /sbin/vconfig add bond0 4
And finally you can configure IP addresses on every interface.vlan combination (bond0.4 in this example)
 /sbin/ifconfig bond0.4 192.168.4.1 broadcast 192.168.4.255 netmask 255.255.255.0
The same type of script would work on a QNAP NAS too, by the way. They offer 8021q.ko and vconfig in their commandline environment as well.
Packets from the bond0 interface leave the device untagged, packets from the bond0.4 interface leave with a tag specifying VLAN 4.
Be aware that these settings only last until the next reboot.

Synology approach (future?)

Synology has its own set of utilities that are used by the webinterface to manage devices. The network interface settings are managed by /usr/syno/sbin/synonet. This utility sets up bonded interfaces, IP addresses, and VLAN entries. However, the utility has the same limitations as the web interface (for unknown reasons): creating a VLAN unconfigures the untagged interface you're working on, and you can't add a second VLAN on the same interface.
It would be nice if synonet could get multi-VLAN support, as all the necessary options seem to be there already. Feature request, Synology?

Sunday, December 23, 2012

Buying the right NAS device for your home lab.

Buying the right NAS device for a vSphere home lab is not an easy task. This blog post documents the decision process you should go through IMHO.

First, decide which data you are going to put on it. Lots of people buy a NAS for secondary data only (I.e. backups), but in a home lab, there's probably primary data too. How important is the data, and do you require a backup of this primary data?

Then, think about the volume of data you need. Is it 1TB, more like 5TB, or rather 10TB?

Number three, protection level. No one wants to lose data, but how badly? Surviving one disk failure is a minimum, but a RAID5 set enters its "danger zone" when that happens. That means an additional failure will make you lose all the data on the set. The danger zone ends after you've replaced the failed disk and it's contents have been rebuilt. RAID6 enters the danger zone after losing a second device before the first is rebuilt. Know your danger zone!

A fourth decision is speed. Bandwidth is a concern to some, but on a Gbit switch, a device with 4 or more disks can often saturate that bandwidth. Multiple Gbit links can help if more bandwidth is needed. But the most important performance indicator is IOPS. Knowing how many IOPS you want is extremely difficult, but once you arrive at a figure, getting the IOPS is a matter of spreading your data over enough individual disks. One WD Caviar Red drive can do about 112 write IOPS or 45 read IOPS of 4 KB. Caching can greatly improve host-facing IOPS as well. This article gives a great view on the world of disk bandwidth, IOPS and latency.

You should also know which protocols your NAS will need to speak, but as most do CIFS, NFS and iSCSI anyway, most use types are covered. If you need specialty features like replication, filter on that too. Also, is your device really supported? The actual support might not matter for a home lab, but it's the strongest statement you can get that it will work.

Conclusion: in most environments, this is going to lead to a NAS configuration with a high number of slots (forget the 2 to 4 bay models), and relatively small disks in those slots. And that is ... a lot more expensive than just adding 3TB drives until you reach the volume you need. As always, there's no such thing as a free lunch: you'll get what you pay for.

Friday, October 5, 2012

Boot device priority in a vSphere VM

While playing around with the bios.bootDeviceClasses parameter (as shown in this example ), we found out that

  1. a device not specified in allow: would still be used if all "allow:"ed devices are unusable (no CD connected, no PXE server found, etc.)
  2. a device specified in deny: would still be used if all other devices are unusable.
So contrary to what the documentation suggests, "allow:" will just move certain devices to the front of the boot device list, and "deny:" moves those devices to the end of the list.

Hope this can help other people trying to make sense of setting boot order in a VM to achieve a specific behavior. In our case: get a VM to reliably boot from CD for automated deployment using the SDK.

Tuesday, October 18, 2011

Too much redundancy will kill you

A customer asked me to verify their vSphere implementation. Everything looked perfectly redundant, in the traditional elegant way: cross over between layers to avoid single points of failure. I had to break the bad news: too much redundancy can mean NO redundancy.
In this case: host has 4 network interfaces (2x dual port card). VM's connect to a vSwitch, which has redundancy over vmnic0 and vmnic2 (using 1 port of each card). Another vSwitch for the storage traffic, same level of redundancy, using vmnic1 and vmnic3. Looking good.
Then the physical level. 4 host interfaces, 2 interconnected network switches. The traditional |X| design connects the two interfaces of every card to different switches. Looking good.

But looking at both configurations together, you'll see that every vSwitch gets connected to one physical switch. The sum of two crossed redundancy configurations equals no redundancy at all.
Enabling CDP or LLDP can help you identify this problem, as you can identify on every interface which physical switch it connects to. In this case the CDP physical switch identifier was the same on vmnic0 and vmnic2, and again the same on vmnic1 and vmnic3.
I advised changing the cabling to four straight || || connections, vmnic0 and vmnic1 to the left switch and vmnic2 and vmnic3 to the right switch. That re-introduces the redundancy they thought they had.

Monday, September 12, 2011

vCenter Appliance and underscores in hostnames

Found out the hard way: don't use underscores in hostnames. It's not allowed by DNS, and it breaks things. In this case: joining vCenter Server Appliance (VCSA) in an Active Directory doesn't work if the hostname of the appliance contains an underscore (_). It also doesn't work if the hostname is "localhost".
If your appliance uses DHCP, the appliance gets its hostname through reverse DNS. So in that case, it _is_ a freaking DNS problem.