HS22 + BOFM ... oh boy

First of all, I'm going to assume u know what blades are

For you guys out there that aren't IBM believers (as I am), HS22 is the latest Intel blade available for IBM bladecenters. They are using the new Nehalem processors and are the successors of the HS21 (xm) series. They use smaller memory dimms so that you can add 12 dimms per blade (allowing you to add 96 gb per blade). Warning though, you'll need to fill the 2 sockets because the first 6 slots are managed by mem-controller socket 1 and the 6 last slot by the mem-controller on socket 2. This means the HS22 are in effect NUMA servers. On the image below you'll also see that there are 3 channels per memory controller. One remark, the more memory you'll put on each channel the slower it will run (1 dimm per channel = 1333mhz max , 2 dimms per channel = 1066mhz max). So balance your memory between your channels/sockets!!!!

So enough about the HS22. I have been implementing BOFM in combination with Boot from san (BFS) on HS22. Boot from SAN does what it says. You no longer have internal disks but you are booting from a LUN. BOFM (Bladecenter Open Fabric Manager) is a technology that lets you virtualize "networks ids". You can put virtual mac address and virtual wwpn on blades. This is defined on a chassis slot level. Meaning that you can pull a blade out and put a spare blade in and it will inherit the old virtual address from the old bade. What is even more powerful, you can configure BFS in BOFM too. Meaning you can pull a blade out, put the same type in and it will boot exactly like the old blade as if it were a clone.

So how do you start? Update, upgrade whatever component is involved :
-AMM latest firmware level
-HBA on the latest firmware level (Qlogic in my case)
-uEFI on the latest firmware level (the new and improved BIOS)
-IMM on the latest firmware level (the new and improved bmc+cKVM)

Be sure they are on the latest level. IBM is releasing a lot of these firmwares and they seem to include BOFM fixes all the time. For example if your qlogic bios adapter is constantly being disabled (effect: disabled BFS), you'll need to update from 1.43 to 1.46.

Next up create an initial configuration in the AMM (Blade > BOFM > Initial configuration). With the bladecenter H I'll give you the following tip (Assuming you are using MSIMS)
-port 1 is onboard ethernet 0 and goes to i/o 1 (must be an ethernet switch)
-port 2 is onboard ethernet 1 and goes to i/o 2 (must be an ethernet switch)
-port 3 goes to i/o 3
-port 4 goes to i/o 4
-port 5 goes to i/o 7 msim 1
-port 6 goes to i/o 8 msim 1
-port 7 goes to i/o 9 msim 2
-port 8 goes to i/o 10 msim 2
So when you create the initial configuration check which kind of switch (FC,ethernet,..) is in which I/O module and link it to the right port. Notice that I/0 5 and 6 are not mentioned here. That is because a blade can never talk to these modules directly. They are uplink switches if there are full blown switches in the horizontal i/o slots (instead of msims).

The result of an initial configuration is a csv. It contains something like

// Blade Center
//IP ,Type (Center) ,Mode ,BladeCenter ,apply

//IP ,Type (Slot) ,Slot ,Mode ,Profile ,Slot ,1 ,enable ,"ibmblade1"

//IP ,Type ,Slot ,Offset,Port ,MAC_1 ,VLAN1 ,MAC_2 ,VLAN2 ,Ethernet ,1 ,0 ,1 ,00:21:5e:94:ff:54 ,0 ,Ethernet ,1 ,0 ,2 ,00:21:5e:94:ff:56 ,0 ,Ethernet ,1 ,0 ,5 ,00:21:5e:44:aa:8e ,0 ,Ethernet ,1 ,0 ,7 ,00:21:5e:44:aa:8f ,0

//IP ,Type ,Slot ,Priority ,WWPN ,LUN ,FCTarget ,1 ,first ,50:06:0e:80:12:02:aa:10 ,0 ,FCTarget ,1 ,second ,50:06:0e:80:12:02:aa:11 ,0

//IP ,Type ,Slot ,Offset ,Port ,WWNN ,WWPN ,Boot-order ,FC ,1 ,0 ,6 ,,21:00:00:1b:36:93:77:6c ,first ,FC ,1 ,0 ,8 ,,21:01:00:1b:36:b3:77:6c ,second

First of all the simplest line of all ,BladeCenter ,apply
This will say that the chassis on this ip will have BOFM applied.

Next ,Slot ,1 ,enable ,"ibmblade1"
This will say you want to enable bofm on slot 1 in chassis The name ibmblade1 is choosable :). If you'll put disable here bofm will be ignore all the other lines of that slot. If you retrieve the current configuration afterwards, the extra lines will be gone. Remember that.

Next ,Ethernet ,1 ,0 ,1 ,00:21:5e:94:ff:54 ,0
In order of appearance:
-Ip of the blade chassis
-saying that you are configuring an ethernet port
-slot 1
-offset 0. Normally this is always 0. If you have double blades meaning the blade is using multiple slots, this number will go up if you are configuring a port that is on the board in the next slot. For example in a HS21 with a MIO, the onboard ports on the MIO will be addressed as slot x (x=the blades slot) offset 1.
-the virtual address (fake mac)
-the pxe vlan (must be under 20)

Next ,FC ,1 ,0 ,6 ,,21:00:00:1b:36:93:77:6c ,first
-Ip of the bladechassis
-saying that you are configuring an fibre channel port
-slot 1
-offset 0. Same as above
-the wwnn of the blade. On qlogic cards this is generated. This is a server based address that should be the same on all of the FC ports. Because they are generated on qlogic cards you should leave them empty
-the virtual address
-the boot order. Basically part of the boot from san setting. "First" means it is using the first boot target you configured (see below). "Second" the second target you defined. "Both" means the port can boot from both targets. This means your wwpn needs to be able to see the wwpns of both targets and the storage is correctly partitioned.

Lastly ,FCTarget ,1 ,first ,50:06:0e:80:12:02:aa:10 ,0
This is a the boot lun target configuration
-ip of the blade chassis
-saying you are defining a lun (FCTarget)
-slot number
-first or second boot lun
-ww(p/n)n of your storage processor. Ask you SAN-Man. You can check these ports/luns in the qlogic adapter by using the scan utility
-Lun 0 on this storage processor.

Ok, once this configuration is done you can apply it. First create a requirement rapport to check if your config is done correctly and then apply it. You will need to power down blades to apply BOFM

One thing I noticed. You can change the configuration of one blade and apply it without interfering with the other blades. Good practice here is :
-retrieve your config from the amm
-change one blade
-create requirement rapport/apply it. This will show that the configuration is only applied/changed on one blade
Do not make a configuration with only lines for your blade. Do retrieve the latest bofm from the amm so you are sure that you won't interfere with the other blades.

Ok once you are all done it might still not boot. You should enable the qlogic adapter bios first. Boot the blade and push the ctrl+q when the qlogic adapter is being discovered. select your adapters one by one and enable the bios in the configuration settings menu.

Once this is done, reboot your blade
-If everything works, the default pxe boot will be disabled by the blade. If you see your blade pxe booting, you have a problem
-Check that when the Qlogic adapter is detected (where you pushed ctrl+q but don't push it) a lun is showing (some message like replaced c: drive to drive d: + the lun)
-If it doesn't work you can always use the scan fibre utility in the qlogic hba's (ctrl+q) to check if you can see the storage processors and the luns). Ask your san-man for help.

My final opinion: The technology is cool. Switching blades will switch their identities. However I found that BOFM is not always stable if you are not on the latest level! Remember update, update, update

VMware ESX (+HA) and correct DNS/NTP settings: a must, must

Today I was at a customer site who is using ESX 4.0 hosts (and 3.5 hosts) both being connected to vCenters. A strange issue occured on both clusters. One of the esx's host got disconnected and didn't want to reconnect. Trying to connect directly with the vi client didn't work + the webservice seemed to be down. On the ESX 4.0 host I got the famous 503 service unavailable error (...)

My first thing I thought, lets reboot mgmt-vmware (it even blocked on the 3.5 ESX). This didn't help :(. The customer then suggested to reboot the server. Although not hopeful it would help, he did it anyway. After the reboot, still the same issue.

After some reading I found some ssl errors in the logs. However this didn't make sense at all. Some more reading lead me to some guy saying that by correcting his DNS settings everything got fixed. So I started the typical nslookup
> esxhostname
> esxhostname.f.q.d.n
> ip
The result was... well timeouts all over the way. The network team had given us new DNS ips but before we were able to change them they already shut down the servers. The result was an unmanageable server.

So i logged on to the console changed the dns settings in /etc/resolv.conf manually. Trying to restart mgmt-vmware server now worked (service mgmt-vmware restart). However the vi client still could not connect or the connection was not restored. Restarting the webAccess service did the final trick (service webAccess restart). We did a final reboot to check if the error would reoccur but luckily it didn't.

Next up was configuring the ntp server. This should have been simple, ...

First of all start by putting the right time manually. Something I learned the hard ways (checking the logs). If you have to much skew (offset of more then 1000 seconds or something) the ntp service won't synchronize. It is thinking the offset is too big and abnormal. Result : it discards the offset, no ntp sync :(

Second of all start by enabling the ntp client before configuring it. This will open up the proper ports in the firewall.

Then configure the ntp service to start automatically and configure your ntp servers.

Lastly you might see that ESX is not synchronizing. Strange behaviour. I solved it by doing
$ service ntpd stop
$ ntpd -d -q
$ service ntpd start
(a trick I learned here)

After all the clocks were done I checked everything was in sync. I opened up putty sessions to all the esx hosts and issued
$ watch "date"
This will execute date every 2 second and show its output. Makes it easy to compare ;)

Then finally I tried to enable HA. The client tried this before but obviously with bad time parameters and dns settings, the process failed. The process started pushing the HA clients and everything ran smoothly. So before you start trying HA and consider the above

One of my big concerns though is that ESX is really really really really sensible to uptime and reachability of your DNS server. I know you can supply 2 DNS servers but still this is scary stuff. I was wondering (and if I have the time to try I will post some result) if I was able to make a dnsless ESX setup.

First tryout would be by testing if /etc/hosts is sufficient for replacing dns. What you will need to do is make adding a lines for each ESX/VC on every host
ip.addr.put.first hostname hostname.fqdn

fe on esx01.mydomain.com with 2 other esx hosts esx02 and esx03 I would get something like localhost esx01 esx01.mydomain.com esx02 esx02.mydomain.com esx03 esx03.mydomain.com vcenter vcenter.mydomain.com

Then on the esx hosts check the nsswitch.conf file and check that files are before dns.
>hosts: files dns

So if that would work correctly, we would have an offline-dns solution (This would assume we tested out HA by plugging out an ESX and everything). However it would be a lot of work if you have a lot of hosts.

You could then make a perl script (assuming you are going to run perl on vcenter) that will pull this config from a central location every 10 minutes. Then compares it with the current file (md5 checksum?) and replace it if necessary. By now you must be thinking I'm insane. I'm going to replace a 40 year old system by a script. What if the central location is going done? No biggy, you still have your local configuration. And lets be honest, if your dns servers are going down, you won't be busy installing new ESX hosts

*big fat warning , the last part is pure theoretical, and if I not tested it. I wrote this at home in my lazyboy :)


Skype and VLC on Ubuntu 9.10

VLC is my all time favorite mediaplayer on Windows simply because it plays everything without interfering with the current installed codecs. This is why I wanted to install it on Ubuntu...

Same goes for skype. Skype is a nice app if you like to call foreign countries without having to pay too much. I loved it on windows and I need it on linux.

But if you open up synaptic you won't find both in the default repositories. This is because they are both not open software (or are using non open source components). For Skype you can download a seperate deb packages on the skyp website. However this will force you to update skype manually etc. So I started googling for a repository.

Medibuntu was the answer. You can find more info here. Adding the repository is easy. You'll just need to enter the following in a terminal (Applications > Accessories > Terminal)

sudo wget http://www.medibuntu.org/sources.list.d/$(lsb_release -cs).list --output-document=/etc/apt/sources.list.d/medibuntu.list && sudo apt-get -q update && sudo apt-get --yes -q --allow-unauthenticated install medibuntu-keyring && sudo apt-get -q update

Big fat warning though:
Medibuntu's repository is deactivated by upgrading to a newer Ubuntu release, so you should run this command again after the release upgrade.

Afterwards you can do your sudo apt-get install vlc or sudo apt-get install skype