Getting firmware version information on ESXi and GNU/Linux

October 10, 2012, 9:21 am

You may not always have the convenient option to install vendor-specific hardware management agents/extension on ESXi hosts or physical servers, for example with appliance-ish OSes like the Check Point SPLAT/Gaia platform (which is just a custom RHEL descendant), or you may run into a server without these tools installed. So how can you still query firmware information on such systems directly from the command line? I will outline a couple of ways here which make it possible to obtain that information.
The example information captured here is from HP Proliant Servers (since G5), but most of it should work in similar ways with other hardware platforms too. Unless noted otherwise, the example commands here should work regardless of whether you have CIM providers or hardware management agents installed or not.

Getting firmware info on ESXi

The tool /usr/lib/vmware/vm-support/bin/swfw.sh available since ESXi 5.x will dump most hardware and firmware information you probably need. If you have the HP (or probably other too) CIM providers installed, it will list detailed information in a specific namespace section such as root/hpq.
But even if you don’t have them installed, it collects generic firmware information in namespace root/cimv2 for BIOS, local RAID controllers, ILO, NICs and more.

BIOS:
While it is possible to query the BIOS version remotely via PowerCLI and it’s even displayed in the vSphere Client GUI under Processor Information, I haven’t found a way to query the BIOS version from the console directly except for /usr/lib/vmware/vm-support/bin/swfw.sh. In dump it will look like this:

in root/hpq:
SMX_SystemFirmware.InstanceID=”HPQ:SMX_SystemFirmware:1″
InstanceID = HPQ:SMX_SystemFirmware:1
ReleaseDate = 20110502000000.000000+000
ClassificationDescriptions = { System Firmware, }
Classifications = { 11, }
Manufacturer = HP
VersionString = 2011.05.02

in root/cimv2:
OMC_SMASHFirmwareIdentity.InstanceID=”34.0″
InstanceID = 34.0
IsLargeBuildNumber = true
VersionString = P62
ReleaseDate = 20110505000000.000000+000
Name = System BIOS
Manufacturer = HP
IsEntity = true
ElementName = System BIOS
Caption = System BIOS

Note that the HP-CIM dump will list the active as well as backup ROM version, so don’t confuse them.

Local RAID-Controller:
For HP SmartArray based RAID Controllers, which all standard HP Proliant Server local SAS controllers are based on, information is also available in the driver’s procnode:

# cat /proc/driver/cciss/cciss0
cciss0: HP Smart Array P400i Controller
Board ID: 0x3235103c
Firmware Version: 7.22# cat /proc/driver/hpsa/hpsa0
hpsa0: HP Smart Array P410i Controller
Board ID: 0x3245103c
Firmware Version: 5.70
Driver Version: HP HPSA Driver (v 5.0.0-17vmw)

With the HP CIM-providers and swfw.sh, you can even get the local hard disk firmware version and drive type info:

SMX_SADiskDriveFirmware.InstanceID=”HPQ:SMX_SADiskDriveFirmware-3QQ1ASPK0000994695QR”
InstanceID = HPQ:SMX_SADiskDriveFirmware-3QQ1ASPK0000994695QR
IsEntity = true
IdentityInfoType = { CIM:SoftwareFamily, HPQ:SoftwareCategory, }
IdentityInfoValue = { HPQ:DF0450B8054, Storage Device, }
TargetTypes = { DF0450B8054, }
ClassificationDescriptions = { Disk Drive Firmware, }
Classifications = { 10, }
Manufacturer = HP
VersionString = HPD6
IsLargeBuildNumber = false
HealthState = 5
StatusDescriptions = { Disk Drive Firmware Status: OK, }
OperationalStatus = { 2, }
Name = Disk Drive Firmware

NICs:
We can easily use ethtool to access the NIC firmware version:

# ethtool -i vmnic1
driver: bnx2
version: 2.0.15g.v50.11-5vmw
firmware-version: bc 1.9.6
bus-info: 0000:05:00.0# ethtool -i vmnic2
driver: e1000e
version: 1.1.2-NAPI
firmware-version: 5.11-2
bus-info: 0000:0b:00.0

ILO:
I am not aware of a way to check the ILO version without relying on the hponcfg tool provided by the HP Offline Utilities bundle or through swfw.sh. While the utilities bundle is a separate package from the actual CIM bundle, it may not work without it.

via swfw.sh in root/hpq with CIM-providers:
SMX_MPFirmware.InstanceID=”HPQ:SMX_MPFirmware:1″
InstanceID = HPQ:SMX_MPFirmware:1
IsEntity = false
ReleaseDate = 20120716000000.000000+000
IdentityInfoType = { CIM:SoftwareFamily, }
IdentityInfoValue = { HPQ:RI7, }
ClassificationDescriptions = { HP Management Processor Firmware, }
Classifications = { 10, }
Manufacturer = Hewlett-Packard
VersionString = 2.12
IsLargeBuildNumber = false
MinorVersion = 12
MajorVersion = 2
HealthState = 5
StatusDescriptions = { Management Processor Firmware Status: OK, }
OperationalStatus = { 2, }
Name = Integrated Lights Out 2 (iLO 2)
Caption = Management Processor Firmware
Description = HP Management Processor Firmware
ElementName = RI7
-
via swfw.sh in root/cimv2 without CIM-providers:
OMC_MCFirmwareIdentity.InstanceID=”46.10000″
InstanceID = 46.10000
IsEntity = true
Classifications = { 8,}
IsLargeBuildNumber = false
OperationalStatus = { 0,}
Description = BMC Firmware (node 0) 46:10000
Caption = BMC Firmware (node 0) 46:10000
ElementName = BMC Firmware (node 0) 46:10000
Manufacturer = Hewlett-Packard
Name = Baseboard Management Controller
VersionString = 2.12
EnabledState = 0
-
# /opt/hp/tools/hponcfg -g
HP Lights-Out Online Configuration utility
Version 4.0-10 (c) Hewlett-Packard Company, 2011
Firmware Revision = 2.09 Device type = iLO 2 Driver name = hpilo

HBAs:
Information on Qlogic and Emulex based Fibre Channel HBAs are available in procnodes:

# cat /proc/scsi/qla2xxx/5
QLogic PCI to Fibre Channel Host Adapter for HPAE311A:
FC Firmware version 5.03.15 (496), Driver version 901.k1.1-14vmw
Host Device Name vmhba2
BIOS version 2.16
FCODE version 2.03
EFI version 2.22
Flash FW version 5.03.15
[...]

Here’s a nice little PowerCLI script to query HBA firmware info too:
http://communities.vmware.com/message/2178785#2178785
Going one step further, with swfw.sh and the CIM providers, you can even get the HP Bladesystem Onboard Administrator firmware version from within an ESXi blade:

SMX_BladeEnclosureFW.InstanceID=”HPQ:SMX_BladeEnclosureFW:1″
InstanceID = HPQ:SMX_BladeEnclosureFW:1
IsEntity = false
IdentityInfoType = { CIM:SoftwareFamily, }
IdentityInfoValue = { HPQ:OA, }
ClassificationDescriptions = { HP Server Blade Enclosure Firmware, }
Classifications = { 10, }
Manufacturer = Hewlett-Packard
VersionString = 3.56

Getting firmware info on a physical GNU/Linux host:

For the most part this is similar to what we can do on ESXi (or rather the other way around).

BIOS:
The BIOS version can be queried via dmidecode:

# dmidecode -t bios
BIOS Information
Vendor: HP
Version: P68
Release Date: 05/05/2011

Local RAID-Controller:
For HP SmartArray based RAID Controllers (cciss), firmware version info should be available in one of the following locations depending on server generation or OS:

# cat /proc/driver/cciss/cciss0
cciss0: HP Smart Array P410i Controller
Board ID: 0x3245103c
Firmware Version: 5.70
-
# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 03 Id: 00 Lun: 00
Vendor: HP Model: P410i Rev: 5.12
-
# cat /sys/class/scsi_host/host0/firmware_revision
5.12

NICs:
Using ethtool again to access the NIC firmware version:

# ethtool -i eth2
driver: bnx2
version: 1.7.9-1
firmware-version: 5.2.3
bus-info: 0000:04:00.0
-
# ethtool -i eth5
driver: e1000
version: 7.6.12-NAPI
firmware-version: 5.12-2
bus-info: 0000:12:00.1

ILO:
Besides using the hponcfg tool. it actually appears to be possible with dmidecode. In my case it displays a “Firmware Revision” at the end of the BIOS section, which seems to match the ILO version:
On an ILO2 server version 2.09, dmidecode displays “Firmware Revision: 2.9″ (missing a 0 there).
On an ILO3 server version 1.28, dmidecode displays “Firmware Revision: 1.28″ correctly.
Here is also an interesting script you can try on your ILO network.

HBAs:
On a recent Proliant G7 servers with 8GB Qlogic-based HBAs, the HBA firmware version does not seem to be easily accessible via a procnodes anymore (similar to the ESXi way), but info is scattered in /sys files:

# cat /sys/class/scsi_host/host4/model_name
HPAK344A
# cat /sys/class/scsi_host/host4/model_desc
HP 8Gb Single Channel PCI-e 2.0 FC HBA
# cat /sys/class/scsi_host/host4/fw_version
5.06.03 (90d5)
# cat /sys/class/scsi_host/host4/optrom_fw_version
4.04.04 128
# cat /sys/class/scsi_host/host4/optrom_efi_version
2.05

Disk firmware info is likely only available with specific RAID-controller utilities (e.g. hpacucli) or management agents, which makes perfect sense though.

↧

Check Point R75.45 released

October 15, 2012, 7:19 am

≫ Next: Re-Attaching a previously detached storage device

≪ Previous: Getting firmware version information on ESXi and GNU/Linux

Last week, Check Point released the first major update to the Gaia-introducing R75.40 version. The release of R75.45 mainly brings enhancements and fixes for the new Gaia platform:
R75.45 Release Notes
R75.45 Resolved Issues
R75.45 Known Limitations

R75.45 includes this R75.40 hotfix.
Among the new features provided, support for 6in4 tunneling, policy-based routing appear to be the most intriguing (at least to me).
The long list of fixes addresses important issues such as long policy install time, kernel memory leaks or IPS not exiting bypass mode again after returning to normal CPU utilization.
Only direct upgrades from R75.40 are supported, no earlier versions.

I haven’t tried this new release yet, but given the list of fixes and potential first prematureness of the still young Gaia platform, I’ll check it out rather sooner than later.

↧

Re-Attaching a previously detached storage device

November 2, 2012, 2:51 am

≫ Next: Invoke-VMScript issue: “The guest operations agent could not be contacted”

≪ Previous: Check Point R75.45 released

A while ago we were getting rid of some old, now unused datastores on our FC-storage, so we proceeded to unmount and remove the VMFS datastore, and then detach the LUN device through the vSphere client as described in Unpresenting a LUN in ESXi 5.x. On that note, hooray for no more claimrule mongering since ESXi 5.x!
After the whole process, I noticed that on some hosts the pseudo-device LUN 0 of an EVA storage system was also unmounted. There is no detach-option for these controllers, so I have no idea how it happened. Rescans were no good and it didn’t cause any issues, but I’d rather have it accessible again. The only problem was: There is no attach option available for these devices.

So in the end, I knew I had to resort to esxcli to re-attach the device properly.
First, checking out the device details and confirming it is turned off (Status):

# esxcli storage core device list -d naa.50001fe150081230
 naa.50001fe150081230
 Display Name: HP Fibre Channel RAID Ctlr (naa.50001fe150081230)
 Has Settable Display Name: true
 Size: 0
 Device Type: RAID Ctlr
 Multipath Plugin: NMP
 Devfs Path:
 Vendor: HP
 Model: HSV200
 Revision: 6240
 SCSI Level: 5
 Is Pseudo: true
 Status: off
 Is RDM Capable: true
 Is Local: false
 Is Removable: false
 Is SSD: false
 Is Offline: false
 Is Perennially Reserved: false
 Thin Provisioning Status: unknown
 Attached Filters:
 VAAI Status: unknown
 Other UIDs: vml.020c00000050001fe150081230485356323030

After reviewing the available commands in the esxcli storage core namespace, I ran the following 2 commands:

# esxcli storage core device set -d naa.50001fe150081230 --state=on
# esxcli storage core device setconfig -d naa.50001fe150081230 --detached=false

Btw. the latter command actually failed at first without having executed the former one before:
Device configuration failed: Sysinfo error on operation returned status : Not ready. Please see the VMkernel log for detailed error information

Now confirm that the device status is set to on again:

# esxcli storage core device list -d naa.50001fe150081230
 naa.50001fe150081230
 Display Name: HP Fibre Channel RAID Ctlr (naa.50001fe150081230)
 Has Settable Display Name: true
 Size: 0
 Device Type: RAID Ctlr
 Multipath Plugin: NMP
 Devfs Path: /vmfs/devices/genscsi/naa.50001fe150081230
 Vendor: HP
 Model: HSV200
 Revision: 6240
 SCSI Level: 5
 Is Pseudo: true
 Status: on
 Is RDM Capable: true
 Is Local: false
 Is Removable: false
 Is SSD: false
 Is Offline: false
 Is Perennially Reserved: false
 Thin Provisioning Status: unknown
 Attached Filters:
 VAAI Status: unknown
 Other UIDs: vml.020c00000050001fe150081230485356323030

The vmkernel log also indicates that the device was turned on, and in the vSphere Client it is listed as attached again. Done.

2012-10-26T14:06:36.423Z cpu5:89724)ScsiDevice: 1268: Device naa.50001fe150081230  has been turned on administratively.
2012-10-26T14:06:36.549Z cpu7:151813)Vol3: 647: Couldn't read volume header from control: Invalid handle
2012-10-26T14:06:36.549Z cpu7:151813)FSS: 4333: No FS driver claimed device 'control': Not supported
2012-10-26T14:06:36.578Z cpu1:151813)VC: 1449: Device rescan time 64 msec (total number of devices 13)
2012-10-26T14:06:36.578Z cpu1:151813)VC: 1452: Filesystem probe time 46 msec (devices probed 7 of 13)

↧

Invoke-VMScript issue: “The guest operations agent could not be contacted”

December 3, 2012, 11:42 am

≫ Next: Control vCenter performance counter collection and get back VM IOPS statistics

≪ Previous: Re-Attaching a previously detached storage device

The Invoke-VMScript PowerCLI function provides a cool way to execute scriptcode inside guests through the VMware tools, on Linux and Windows VMs (as long as a couple of prerequisites are met). While playing with it a while ago, I noticed that on many VMs, it wouldn’t run properly, instead just spilling the error “The guest operations agent could not be contacted”. To fix this, the VMware Tools service inside the guest had to be restarted. Not thinking too much of the oddness, I left it at that since I didn’t have a real use case for the Invoke-VMScript function.
This changed later on, when I ran into the same issue again that made me finally dig a bit deeper.

Quite a few other people reported this issue on the VMTN forums since some time, but there was no solution posted. A reference to Backup Exec initiated backup operations made me curious though – had I not fought with another problem induced by it already?

I then was able to track it down to the Reload Virtual Machine operation, which Backup Exec triggers each time before removing the snapshot of a backup too. But not only that, unlike my previous issue, this problem is fully reproducible manually without any involvement of Backup Exec:

1. Confirm Invoke-VMScript works currently (restart VMware Tools if not):
Invoke-VMScript -ScriptText ‘ipconfig’ -VM $vm -GuestCredential $creds
–> Is being executed OK

2. On the ESXi shell of the host running the VM (I actually don’t know any other way to trigger the Reload-VM task manually), get the VM ID (or VMX path) with vim-cmd vmsvc/getallvms | grep -i [MyVMName]

3.Now perform a Reload-VM operation via vim-cmd vmsvc/reload [ID]
The vmware.log file of the VM will now record the following messages:

2012-11-19T15:56:59.760Z| vmx| VmdbPipeStreamsOvlError Couldn't read: OVL_STATUS_EOF, (11) Resource temporarily unavailable.
2012-11-19T15:56:59.760Z| vmx| VmdbCnxDisconnect: Disconnect: closed pipe for pub cnx '/db/connection/#6/' (-32)
2012-11-19T15:56:59.760Z| vmx| VmdbDbRemoveCnx: Removing Cnx from Db for '/db/connection/#6/'
2012-11-19T15:56:59.771Z| vmx| SOCKET 2 (78) recv detected client closed connection
2012-11-19T15:56:59.771Z| vmx| Vix: [1889886 mainDispatch.c:2984]: VMAutomation: Connection Error (4) on connection 1.
2012-11-19T15:56:59.818Z| vmx| VmdbAddConnection: cnxPath=/db/connection/#b/, cnxIx=3
2012-11-19T15:56:59.866Z| vmx| Vix: [1889886 vigorCommands.c:940]: VigorHotPlugManagerEndBatchCommandCallback: vmdbErr = -24

4. Try running an Invoke-VMScript command again and it will fail with the dreaded “The guest operations agent could not be contacted”:

D:\> Invoke-VMScript -ScriptText 'ipconfig' -VM $vm -GuestCredential $creds

Invoke-VMScript : 19.11.2012 16:57:06    Invoke-VMScript        The guest operations agent could not be contacted.
Bei Zeile:1 Zeichen:16
+ Invoke-VMScript <<<<  -ScriptText 'ipconfig' -VM $vm -GuestCredential $creds
    + CategoryInfo          : NotSpecified: (:) [Invoke-VMScript], ViError
    + FullyQualifiedErrorId : Client20_VmGuestServiceImpl_RunScriptInGuest_ViError,VMware.VimAutomation.ViCore.Cmdlets.Commands.InvokeVmScript

5. Restart the VMware Tools service in the VM and Invoke-VMScript will work again until the next Reload-VM takes place. vmware.log messages during tools restart:

2012-11-19T15:57:41.261Z| vcpu-0| TOOLS autoupgrade protocol version 0
2012-11-19T15:57:41.261Z| vcpu-0| TOOLS ToolsCapabilityGuestTempDirectory received 0
2012-11-19T15:57:41.265Z| vcpu-0| GuestRpc: Reinitializing Channel 0(toolbox)
2012-11-19T15:57:41.265Z| vcpu-0| GuestRpc: Channel 0 reinitialized.
2012-11-19T15:57:43.180Z| vcpu-0| GuestRpc: Channel 0, guest application toolbox.
2012-11-19T15:57:43.186Z| vcpu-0| TOOLS ToolsCapabilityGuestTempDirectory received 1 C:\Windows\TEMP\vmware-SYSTEM
2012-11-19T15:57:43.201Z| vcpu-0| TOOLS autoupgrade protocol version 2
2012-11-19T15:57:43.202Z| vcpu-0| TOOLS ToolsCapabilityGuestConfDirectory received C:\ProgramData\VMware\VMware Tools
2012-11-19T15:57:43.203Z| vcpu-0| TOOLS Received tools.set.version rpc call, version = 8389.
2012-11-19T15:57:43.203Z| vcpu-0| ToolsSetVersionWork did nothing; new tools version (8389) matches old Tools version
2012-11-19T15:57:43.204Z| vcpu-0| TOOLS unified loop capability requested by 'toolbox'; now sending options via TCLO
2012-11-19T15:57:43.205Z| vcpu-0| Guest: toolbox: Version: build-652272
2012-11-19T16:00:28.567Z| vcpu-0| TOOLS unified loop capability requested by 'toolbox-ui'; now sending options via TCLO
2012-11-19T16:00:28.567Z| vcpu-0| GuestRpc: Channel 6, guest application toolbox-ui.

This is perfectly reproducible with Windows and Linux VMs on ESXi 5.0 and 5.1, with the respective PowerCLI and VMware Tools versions, on 4.x probably too (should have been on 4.1 when I first played with Invoke-VMScript).

This issue was confirmed by VMware support too, also hinting at the following hostd.log messages:

TS [XXX info 'vm:/vmfs/volumes/<UUID>/<VM>/<VM>.vmx' opID=XXX] CheckVMStateForGuestOperation: GuestOps are not ready.
TS [XXX info 'Default' opID=XXX] AdapterServer caught exception: vim.fault.GuestOperationsUnavailable

It appears that unfortunately, there is no workaround other than restarting VMware tools in the guest for now. As the VMware support pointed out, this known issue is supposed to be fixed with the soon-ish releases of 5.0 Update 2 and 5.1 Update 1.

↧

Control vCenter performance counter collection and get back VM IOPS statistics

December 17, 2012, 8:34 am

≫ Next: January (or Febuary?) HP ESXi updates

≪ Previous: Invoke-VMScript issue: “The guest operations agent could not be contacted”

After updating from vCenter 4.1 to 5.0 quite some time ago, I noticed how my custom scripts collecting historical VM-level performance statistics suddenly logged zero-values for the IOPS counters datastore.numberReadAveraged.average and datastore.numberWriteAveraged.average of all VMs. ಠ_ಠ
Turns out someone decided those statistics weren’t worth collecting in the long run anymore with 5.0 or something and simply moved them to the vCenter statistics collection level 3, which is more of a verbose troubleshooting level with lots of usually uninteresting counters.
VMware does not recommend using a statistics level greater than 2 except for short-term data collection too.

Querying for these metrics in any rollup interval yields nothing, the values are only available in realtime mode:

Get-VM vm1 | Get-StatType -Interval 1800 | ? {$_ -match "datastore.number(Read|Write)Averaged.average" }
-

Get-VM vm1 | Get-StatType -Realtime | ? {$_ -match "datastore.number(Read|Write)Averaged.average" }
datastore.numberReadAveraged.average
datastore.numberWriteAveraged.average

This naturally raised the question if it’s possible to just re-configure the counters to belong to a lower statistics level or have some more fine-grained control over this in general.
Unfortunately it seemed like I was out of luck, I thought about opening a Support Request or rather “Feature Request” but did end up procrastinating and forgetting, not really expecting much out of a mere Feature Request case anyways.

But while engaging support for some other problem, I asked about this as a sidenote and the supporter immediately recalled two KB articles:
Change the collection level for Storage DRS and SIOC data counters in vSphere 5.0 by using the Level Mapping Utility
http://kb.vmware.com/kb/2014382
Change the collection level for Storage DRS and SIOC data counters in vSphere 5.0 Update 1 by using the Level Mapping Utility
http://kb.vmware.com/kb/2009532

These articles provide a small Powershell script, the “LevelMappingUtility” which allows you to manipulate the statistics level affiliation of any performance counter, or at least it seems so. Exactly what I was looking for!
First we can confirm the current statistics level of all counters and we’ll see how the “PerDeviceLevel” (per VM in my case) of the IOPS counters I’m interested in is at 3. Not so good:

Get-PxCounterLevelMapping | ? {$_.Name -match "datastore.number(Read|Write)Averaged.average" }
Name                                  AggregateLevel PerDeviceLevel Server
----                                  -------------- -------------- ------
datastore.numberReadAveraged.average  1              3              vcenter.local
datastore.numberWriteAveraged.average 1              3              vcenter.local

The Set-Function to actually change statistics levels can read from a CSV-File about the counters and how they are supposed to be manipulated. The by default provided CSV-File will change the statistics level for a lot of other counters too:

#TYPE VMware.VimAutomation.PowerCliExtensions.CounterLevelMapping,,,
 Name,AggregateLevel,PerDeviceLevel,Server
 disk.numberReadAveraged.average,1,1,example.server.com
 disk.numberWriteAveraged.average,1,1,example.server.com
 virtualDisk.numberReadAveraged.average,1,1,example.server.com
 virtualDisk.numberWriteAveraged.average,1,1,example.server.com
 virtualDisk.totalReadLatency.average,1,1,example.server.com
 virtualDisk.totalWriteLatency.average,1,1,example.server.com
 datastore.numberReadAveraged.average,1,1,example.server.com
 datastore.numberWriteAveraged.average,1,1,example.server.com
 datastore.totalReadLatency.average,1,1,example.server.com
 datastore.totalWriteLatency.average,1,1,example.server.com
 datastore.datastoreIops.average,1,1,example.server.com
 datastore.sizeNormalizedDatastoreLatency.average,1,1,example.server.com
 datastore.datastoreReadIops.latest,1,1,example.server.com
 datastore.datastoreReadOIO.latest,1,1,example.server.com
 datastore.datastoreWriteIops.latest,1,1,example.server.com
 datastore.datastoreWriteOIO.latest,1,1,example.server.com
 datastore.siocActiveTimePercentage.average,1,1,example.server.com
 datastore.datastoreVMObservedLatency.latest,1,1,example.server.com
 datastore.datastoreMaxQueueDepth.latest,1,1,example.server.com
 disk.maxQueueDepth.average,1,1,example.server.com
 disk.deviceLatency.average,1,1,example.server.com

As I’m only interested in two distinct counters for now and don’t want to blow up my database too much, I simply removed all other lines and ran it with that:

Import-csv -Path D:\LevelMappingUtility\counter.csv | Set-PxCounterLevelMapping
Updating datastore.numberReadAveraged.average in server vcenter.local
Updating datastore.numberWriteAveraged.average in server vcenter.local

Get-PxCounterLevelMapping | ? {$_.Name -match "datastore.number(Read|Write)Averaged.average" }
Name                                  AggregateLevel PerDeviceLevel Server
----                                  -------------- -------------- ------
datastore.numberReadAveraged.average  1              1              vcenter.local
datastore.numberWriteAveraged.average 1              1              vcenter.local

After a brief period, when the first rollup tasks on the database have finished, you should be able to access the statistics not only in real-time mode, but permanently in the respective interval slot too:

Get-VM vm1 | Get-StatType -Interval 1800 | ? {$_ -match "datastore.number(Read|Write)Averaged.average" }
datastore.numberReadAveraged.average
datastore.numberWriteAveraged.average

Finally!

You could also use that tool to remove some (for you) uninteresting counters from the lower statistics levels to keep the vCenter DB compact and tidy, but I’ll leave it at that.

↧

January (or Febuary?) HP ESXi updates

January 14, 2013, 8:47 am

≫ Next: iLO Login check script

≪ Previous: Control vCenter performance counter collection and get back VM IOPS statistics

Attention: [Update 16.01.2013]
HP actually pulled the updates (which were titled “February” updates) from their VIBs Depot site and purged the references from the depot metadata indexes as well. I’m not sure what’s going on but you won’t be able to apply these updates (via Update Manager) unless you downloaded them already. But even if you did, you should refrain from using these bundles at this time. Unfortunately there seems to be no way of properly removing them from Update Manager if it pulled the metadata already.
[/Update]

[Update 21.02.2013]
HP re-released the VIBs available at http://vibsdepot.hp.com/hpq/feb2013/
[/Update]

[Update 23.02.2013]
(Thanks to milanod for the hint in the comments)
HP actually removed the re-released updates from the vibsdepot yet again?!
The updated bundles are still listed on the software/support/drivers lists for Proliant Servers though:
http://www.hp.com/swpublishing/MTX-c22b0c1988f147308f06bb4ab9 hp-HPUtil-esxi5.0-bundle-1.4-15.zip
http://www.hp.com/swpublishing/MTX-01441a612d354aba868f22f96a hp-esxi5.0uX-bundle-1.4-16.zip
I’m speechless in the face of this unprecedented fail.
[/Update]

[Update 25.02.2013]
Uh-oh, the updates SEEM to be back at http://vibsdepot.hp.com/hpq/feb2013/. File dates are from Jan 4th and the bundles md5sums match the ones from the initial release mid-January (which this post was about) exactly. So if there really was a bug with the release, it must still be there.
Taking bets on how long it’ll take HP to offline them again.
[/Update]

[Update 22.04.2013]
(Thanks to Wu in the comments)
The issue with the SmartArray warning which this bundle brought us has been fixed in a recent update.
[/Update]

After some very minor updates back in October that did not come with release notes it’s time for another round of updates to the ESXi HP extensions and other stuff. Unfortunately, we don’t seem to be getting release notes or general infos now either.
But these updates are publicly available on http://vibsdepot.hp.com/hpq/feb2013/ already and your VMware Update Manager should have already picked them up if you set it up to use the HP VIB depot.

Since HP is so kind to not provide release notes, we can only guess about actual fixes or improvements, but we can at least check which of the VIBs contained in the offline bundles really do provide updates (spoiler: not that much).

First let’s have a look at what our current HP VIB versions on an ESXi host are:
(Hint: You can of course do that via PowerCLI centrally too.)

# esxcli software vib list | grep -i hp
char-hpcru                     5.0.3.09-1OEM.500.0.0.434156          Hewlett-Packard  PartnerSupported  2012-11-20
char-hpilo                     500.9.0.0.9-1OEM.500.0.0.434156       Hewlett-Packard  PartnerSupported  2012-11-12
hp-ams                         500.9.2.0-11.434156                   Hewlett-Packard  PartnerSupported  2012-11-12
hp-smx-provider                500.03.01.10.2-434156                 Hewlett-Packard  VMwareAccepted    2012-11-20
hpacucli                       9.20-9.0                              Hewlett-Packard  PartnerSupported  2012-11-12
hpbootcfg                      01-01.02                              Hewlett-Packard  PartnerSupported  2012-11-12
hponcfg                        04-00.10                              Hewlett-Packard  PartnerSupported  2012-11-12
hpnmi                          2.0.11-434156                         hp               PartnerSupported  2012-11-12

This system was installed with the HP custom ESXi 5.1 ISO 5.32.5. I omitted some irrelevant VIBs such as the “hp-build”, “scsi-hpsa” or P2000 VAAI bundle from esxcli the output, as those are not part of the updates. All custom VIB versions of this and other HP images are also listed here.
The easiest way to check for differences is uploading the new bundle to one of your hosts (you can also point esxcli to a Bundle-URL if your ESXi host has internet access) and then use esxcli to compare the content with the actually installed VIB versions like this:

1. HP ESXi offline bundle aka HP CIM providers aka hardware management extensions
# esxcli software sources vib list -d /tmp/hp-esxi5.0uX-bundle-1.4-16.zip
Name             Version                          Vendor           Creation Date  Acceptance Level  Status
---------------  -------------------------------  ---------------  -------------  ----------------  ---------
char-hpcru       5.0.3.09-1OEM.500.0.0.434156     Hewlett-Packard  2012-08-30     PartnerSupported  Installed
char-hpilo       500.9.0.0.9-1OEM.500.0.0.434156  Hewlett-Packard  2011-10-07     PartnerSupported  Installed
hp-smx-provider  500.03.02.00.23-434156           Hewlett-Packard  2012-12-03     VMwareAccepted    Update
hp-ams           500.9.3.0-13.434156              Hewlett-Packard  2012-11-30     PartnerSupported  Update

2. HP Utility bundle containing a few tools for configuring ILO, HP RAID-Controllers, BIOS boot order from the ESXi shell
# esxcli software sources vib list -d /tmp/hp-HPUtil-esxi5.0-bundle-1.4-15.zip
Name        Version                          Vendor           Creation Date  Acceptance Level  Status
----------  -------------------------------  ---------------  -------------  ----------------  ---------
hponcfg     04-00.10                         Hewlett-Packard  2011-11-13     PartnerSupported  Installed
hpacucli    9.40-12.0                        Hewlett-Packard  2012-12-13     PartnerSupported  Update
char-hpilo  500.9.0.0.9-1OEM.500.0.0.434156  Hewlett-Packard  2011-10-07     PartnerSupported  Installed
hpbootcfg   01-01.02                         Hewlett-Packard  2012-04-05     PartnerSupported  Installed

3. HP NMI bundle for non-maskable interrupt support in case of hardware failures
# esxcli software sources vib list -d /tmp/hp-nmi-esxi5.0-bundle-2.1-2.zip
Name   Version        Vendor  Creation Date  Acceptance Level  Status
-----  -------------  ------  -------------  ----------------  ---------
hpnmi  2.0.11-434156  hp      2011-07-29     PartnerSupported  Installed

4. HP Agentless Monitoring Service (AMS) bundle (Gen8 servers)
# esxcli software sources vib list -d /tmp/hp-ams-esxi5.0-bundle-9.3.0-14.zip
Name            Version                          Vendor           Creation Date  Acceptance Level  Status
--------------  -------------------------------  ---------------  -------------  ----------------  ---------
char-hpcru      5.0.3.09-1OEM.500.0.0.434156     Hewlett-Packard  2012-08-30     PartnerSupported  Installed
char-hpilo      500.9.0.0.9-1OEM.500.0.0.434156  Hewlett-Packard  2011-10-07     PartnerSupported  Installed
hp-smx-limited  500.03.02.00.25-434156           Hewlett-Packard  2012-12-03     VMwareAccepted    Update
hp-ams          500.9.3.0-13.434156              Hewlett-Packard  2012-11-30     PartnerSupported  Update

This parses the metadata of the offline bundle provided, listing its contents. Of particular interest is the “Status” column, indicating whether the corresponding VIB is newer than the currently installed version (it will list “Not Installed” if there is no matching VIB yet).
To get even more detailed info about a VIB, for example if it needs maintenance mode for installation you can run the vib get command:

Most info here is reflected from the content of the vmware.xml file inside metadata-hp-nmi-esxi5.0-bundle-2.1-2.zip
# esxcli software sources vib get -d /tmp/hp-nmi-esxi5.0-bundle-2.1-2.zip
hp_bootbank_hpnmi_2.0.11-434156
   Name: hpnmi
   Version: 2.0.11-434156
   Type: bootbank
   Vendor: hp
   Acceptance Level: PartnerSupported
   Summary: hpnmi for ESXi 5.0
   Description: hpnmi handler for ESXi 5.0 Server
   ReferenceURLs:
   Creation Date: 2011-07-29
   Depends: uwglibc-2.5-34-1, uwglibc64-2.5-34-1, libvmkuser-5.0.0-1, uwvmkcall-5.0.0-1
   Conflicts:
   Replaces:
   Provides:
   Maintenance Mode Required: False
   Hardware Platforms Required: HP, hp, Hewlett-Packard, Hewlett-Packard Company
   Live Install Allowed: True
   Live Remove Allowed: True
   Stateless Ready: True
   Overlay: False
   Tags:
   Payloads: hpnmi

So in a nutshell, the current January/February bundles provide updates to only a subset of VIBs as usual, no major version number leaps either. The HP NMI bundle was not updated for example, which is nothing new and can be confusing at times when HP decides to give the “outer” offline bundle a higher version even though it contains an identical binary VIB.

As usual, it’s enough to configure your VMware Update Manager extension baseline to consist of the “HP ESXi 5.0 Complete Bundle Update 1.6″ package only, as it contains all VIBs of the 4 bundles (this is the metadata zipfile making up this package btw.).

The scsi-hpsa SmartArray RAID Controller Driver has been updated to 5.0.0-40.1 (19 Feb 2013) too:
hpsa-500-5.0.0-offline_bundle-933277.zip

Beyond ESXi bundle updates

Apart from these updates to ESXi bundles, there is also a number of new firmware (mainly BIOS as it seems) releases. Note that there is no new HP Service Pack for Proliant release yet, though it’s about time for another one.
There are no Bladesystem Virtual Connect, Onboard Administrator or iLO updates either.
Remember that you can now also update some firmware on an ESXi host online without having to boot update DVDs or such.
Excerpts from a few recent firmware updates:

* RECOMMENDED * HP BladeSystem c-Class Virtual Connect Support Utility 1.7.1
Users are required to use VCSU 1.7.1 or higher if updating to Virtual Connect 3.70 or higher

** CRITICAL ** Online ROM Flash Component for VMware ESXi – EF0300FARMU, EF0450FARMV, EF0600FARNA drive
HPD6 (17 Dec 2012) (Those are 450GB SAS disks we have in our P4800 G2 / MDS600 storage nodes)
This firmware prevents a rare condition that may occur during a WRITE SAME command sequence that may result in incorrect data being written to the hard drive. The WRITE SAME command may be used during RAID ARRAY parity initialization. – Did I read WRITE SAME? Also sounds like a VAAI primitive.

* RECOMMENDED * Systems ROMPaq Firmware Upgrade for HP ProLiant BL460c/WS460c Gen8 (I31) Servers (For USB Key-Media)
2012.12.14 (A) (11 Jan 2013)
Problems Fixed:
Resolved an issue where the system may experience a performance issue, usually seen in a degradation of network throughput, after updating to the 08/20/2012 revision of the System ROM. This issue only exists with the 08/20/2012 revision of the System ROM.
Resolved an issue where no message was displayed and no Integrated Management Log (IML) entry is logged for certain memory errors that result in DIMMs not being usable. This issue would look like the operating system having access to less memory than is actually installed without any error indicated.
Removed the Advanced ROM-Based Setup Utility (RBSU) option to disable Data Direct I/O (DDIO). It is no longer recommended that users disable this option due to the negative impacts on system performance. For systems that had previously disabled Data Direct I/O, the option will remain disabled. Defaults must be restored on the system to re-enable this functionality for this situation.
Resolved a rare issue where the system may experience a temporary loss of video, such as a blank screen on the local monitor and iLO Remote Console, if a key is pressed during POST during Option ROM Execution.
Resolved an issue where the order in which processors are presented to the Operating System may change across multiple system boots.
Resolved an issue where the HP ProLiant WS460c Gen8 may experience an Illegal OP-Code Red Screen error condition when the system sits at a Non-System Disk Error state at the end of POST for an extended period of time. This issue does not affect HP ProLiant BL460c Gen8 servers.
Enhancements:
Optimized the memory settings to improve the reliability of the memory system.
Added a ROM-Based Setup Utility (RBSU) option for HP Option ROM Prompting.
Added the latest product names of optional expansion cards and updated language translations (for non-English modes) in the ROM-Based Setup Utility (RBSU).

* RECOMMENDED * Systems ROMPaq Firmware Upgrade for HP ProLiant DL380 G7 (P67) Servers (For USB Key-Media)
2012.12.02 (A) (11 Jan 2013)
Problems Fixed:
Resolved an issue where the ROM Based Setup Utility (RBSU) Command Line Interface (CLI) would not set the QPI Bandwidth Optimization (RTID) feature properly.
Enhancements:
Optimized the memory settings to improve the reliability of the memory system.

* RECOMMENDED * Systems ROMPaq Firmware Upgrade for HP ProLiant BL490c G7 (I28) Servers
2012.12.03 (A) (11 Jan 2013)
Problems Fixed:
Resolved an issue where the ROM Based Setup Utility (RBSU) Command Line Interface (CLI) would not set the QPI Bandwidth Optimization (RTID) feature properly.
Enhancements:
Optimized the memory settings to improve the reliability of the memory system.

Oh, and no Emulex be2net CNA firmware update either, though it won’t be long before a publicly available update judging by this.

http://www.hp.com/swpublishing/MTX-c22b0c1988f147308f06bb4ab9

↧

iLO Login check script

January 14, 2013, 10:30 am

≫ Next: nscd DNS caching and postfix

≪ Previous: January (or Febuary?) HP ESXi updates

We recently changed the iLO local account logins in favor of LDAP authentication against our AD, which is cool but raised the issue that sometimes logins seemed to work with my AD account and sometimes not, because not every system was configured for LDAP authentication properly.

Instead of checking logins on dozens of servers manually (with the nice iLO failed login delay), I took a stab at analyzing the login procedures and scripting the logins myself.
So I came up with this horrible piece of bash script doing exactly that. I checked this script with all known iLO versions 1, 2, 3 and 4, and it worked with all of them (the login procedure for versions 1/2 and 3/4 are identical). Running it requires an argument pointing to a file containing the iLO hostnames or IPs to connect to.
Here’s the script on pastebin with formatting: http://pastebin.com/i2Y0xSTQ:

#!/bin/bash

if [ $# -eq 0 ]
then
        echo "No arguments supplied. Expecting a file with a list of ILO-IPs/DNS names to connect to. E.g. run ./ilocheck.sh /tmp/ilo-list"
        exit 1
fi

echo "Enter FULL AD-Account DN (required for ILO1/2) or local account name: (EX: CN=adminuser,OU=departmen1,OU=top,DC=domain,DC=local)"
read -e userdn
userdn64=$( echo -n $userdn | base64 -w 0 )
echo "Enter password:"
read -es pw
pw64=$( echo -n $pw | base64 -w 0 )

cat $@ | sort | while read ilo
do
        ilourl="https://$ilo"
        echo -e "\nChecking ILO Interface on $ilourl..."
        curl -ks "$ilourl" | if grep -Pq "HP Integrated Lights-Out( 2)? Login"
        then
                echo "$ilourl is an ILO2 or ILO1 System"
                curl -ks "$ilourl/login.htm" | grep -A1 "sessionkey=" | grep -Po '\w[^\"]+' > /tmp/ilotemp
                sessionkey=$( awk 'FNR == 2 {print}' /tmp/ilotemp )
                sessionindex=$( awk 'FNR == 4 {print}' /tmp/ilotemp )
                curl -ks "$ilourl/index.htm" --header "Cookie: hp-iLO-Login=$sessionindex:$userdn64:$pw64:$sessionkey" --header "Referer: $ilourl/login.htm" | if grep -q "has detected a failed login attempt"
                then
                        echo "Login on $ilourl NOT successful."
                else
                        echo "Login on $ilourl successful."
                fi

        else
                curl -ks "$ilourl" | if grep -Pq "iLO [34]"
                then
                        echo "$ilourl is an ILO3 or ILO4 System"
                        curl -ks "$ilourl/json/login_session" -X POST --data "{\"method\":\"login\",\"user_login\":\"$userdn\",\"password\":\"$pw\"}" | if grep -q "JS_ERR_NO_PRIV"
                        then
                                echo "Login on $ilourl NOT successful."
                        else
                                echo "Login on $ilourl successful."
                        fi
                else
                        echo "ILO Interface of $ilourl unreachable or not found"
                fi
        fi
done

Example script run:

$ ./ilo.sh ilos.txt
Enter FULL AD-Account DN (required for ILO1/2) or local account name: (EX: CN=adminuser,OU=departmen1,OU=top,DC=domain,DC=local)
CN=myusername,OU=departmen1,OU=top,DC=domain,DC=local
Enter password:

Checking ILO Interface on https://10.88.1.13...
https://10.88.1.13 is an ILO2 or ILO1 System
Login on https://10.88.1.13 successful.

Checking ILO Interface on https://10.89.4.46...
https://10.89.4.46 is an ILO3 or ILO4 System
Login on https://10.89.4.46 successful.

Checking ILO Interface on https://ilo-server74.ilo.local...
https://server74.ilo.local is an ILO3 or ILO4 System
Login on https://server74.ilo.local NOT successful

Checking ILO Interface on https://ilo-server94.ilo.local...
https://ilo-server94.ilo.local is an ILO2 or ILO1 System
Login on https://ilo-server94.ilo.local NOT successful.

Whether you want to check local iLO or LDAP/AD accounts actually doesn’t matter, it will work with both. But be aware that LDAP authentication on iLO 1 and 2 requires you to specify the full Distinguished Name of your account on the iLO login page or in this script, e.g. something like “CN=adminuser,OU=departmen1,OU=top,DC=domain,DC=local”.
You need to enter that if you want to connect to iLO1/2 with Firefox for example too, but not with IE as an Active-X plugin there actually takes care of transforming your short user name to the DN.

Here’s a few interesting points I dug up during this:
The iLO1 and iLO2 login mechanism seems a bit dumb and clumsy. It wants you to connect to /login.htm, where a Javascript will generate a cookie with a sessionkey and sessionindex attribute. The actual Login form then sends this cookie including your base64′d username and password to /index.htm via an HTTP GET. This GET also MUST contain a Referer-header from the login URL (e.g. “Referer: https://ilo.host/login.htm“) or it won’t accept your login.

iLO3 and iLO4 use proper HTTP POSTs here, but seem to lack proper dynamic attributes on the login form to prevent XSRF.

During testing I stumbled among a few “Directory connection limit reached” errors, which were caused because I don’t end the sessions properly again (I could include that in the script as well, maybe another time). You need to wait a while for it to timeout if you open too many sessions to one iLO.

CN=adminuser,OU=departmen1,OU=top,DC=domain,DC=local

↧

nscd DNS caching and postfix

March 8, 2013, 3:40 am

≫ Next: April HP ESXi bundle update fixes SmartArray warning

≪ Previous: iLO Login check script

A few of our mail gateway servers running with postfix/policyd-weight/amavis/spamassaisin generate a lot of DNS queries to our DNS servers at times.
I’m not particularly concerned about that myself but there were some discussions about whether we should or how we could decrease the volume of DNS queries.

One suggestion for an easy and convenient solution was to just install the Name Service Cache Daemon (nscd) to cache responses locally on the mail server. They implemented this quickly on one of the servers but it didn’t really seem to work, it still generated loads of queries and the nscd statistics didn’t indicate that caching was working. Also, the statistics output of nscd -g always displayed 0% cache hit ratio and 0 cache hits on positive entries.
So they just as quickly abandoned the idea without digging into it deeper and more or less forgot about the whole plan in general, as it wasn’t like we had any real issues in the first place.
Other options discussed were setting up dedicated caching-only resolvers (onto the hosts themselves) which wouldn’t have been difficult either.

Fast forward a few months and the so called “issue” of too many DNS queries came up again recently and I decided to check nscd myself and why it supposedly wouldn’t work.

After installing and starting nscd and without really knowing anything about it, it’s easy to verify with stuff like ping and tcpdump whether DNS queries are actually leaving your host. Note that bind-tools like nslookup, dig or host always query DNS servers directly so you won’t see any effect of nscd when running these.
There’s not really anything to configure to make it work, the standard /etc/nscd.conf should work fine: (a change to my default RHEL/CentOS /etc/nsswitch.conf is not necessary either)

$ grep hosts /etc/nscd.conf
        enable-cache            hosts           yes
        positive-time-to-live   hosts           3600
        negative-time-to-live   hosts           20
        suggested-size          hosts           211
        check-files             hosts           yes
        persistent              hosts           yes
        shared                  hosts           yes
        max-db-size             hosts           33554432

$ grep hosts /etc/nsswitch.conf
     hosts:      files dns

The cached values are stored in a binary file in /var/db/nscd/hosts, you can run strings on it to see which names are stored within:

# strings /var/db/nscd/hosts | grep -P '[\w-]+\.\w+' | sort -u
 google.com
 localhost4.localdomain4
 localhost.localdomain
 www.google.com

However, I was puzzled as to why the nscd statistics (nscd -g) kept indicating 0 cache hits while it obviously did properly cache and return hosts:

# nscd -g | grep 'hosts cache' -A 22
 hosts cache:
 yes  cache is enabled
 yes  cache is persistent
 no  cache is shared
 211  suggested size
 216064  total data pool size
 2256  used data pool size
 3600  seconds time to live for positive entries
 20  seconds time to live for negative entries
 0  cache hits on positive entries
 0  cache hits on negative entries
 55  cache misses on positive entries
 10  cache misses on negative entries
 0% cache hit rate
 17  current number of cached values
 18  maximum number of cached values
 2  maximum chain length searched
 0  number of delays on rdlock
 0  number of delays on wrlock
 0  memory allocations failed
 yes  check /etc/hosts for changes

After a bit of googling I found this important explanation in the comments explaining why this happened:
John on August 13th, 2008Adam, you cache is (probably) working, but nscd shows a 0% cache hit rate because you have the “shared” option turned on. That allows clients to directly search the nscd cache themselves instead of asking the nscd daemon; as a side effect, nscd can’t collect statistics about these search.
To verify that nscd is working, temporarily set “shared no”, restart nscd, and then cause some lookups. You should start to see a high cache hit rate %. Don’t forget to set “shared yes” again because it is much faster.

After setting the “shared” option to “no” to /etc/nscd.conf and restarting nscd I was now seeing statistics on cache hits too! Confusing bummer.

Having verified nscd indeed worked I turned to postfix, which still showed no signs of taking advantage of nscd. Turns out the postfix default directives always query DNS servers directly:
$ postconf | grep host_lookup
lmtp_host_lookup = dns
smtp_host_lookup = dns
What mechanisms the Postfix SMTP client uses to look up a host’s IP address. This parameter is ignored when DNS lookups are disabled (see: disable_dns_lookups).
dns Hosts can be found in the DNS (preferred).
native Use the native naming service only (nsswitch.conf, or equivalent mechanism).

(dns also means /etc/hosts entries are ignored by postfix too.)

So setting these options from “dns” to “native” and restarting postfix fixed this, nscd was now queried by postfix and returned cached values accordingly.

It should be noted however, that besides postfix other components like policyd-weight/amavis/spamassaisin still generate a lot of direct DNS queries without using nscd.
I haven’t found a way to disable this in a similar manner yet, if you have any idea please share it.

↧

April HP ESXi bundle update fixes SmartArray warning

April 22, 2013, 2:39 am

≫ Next: ESXi 5.1 Update 1 and vCenter 5.1 Update 1 released – grab your fixes

≪ Previous: nscd DNS caching and postfix

After the ridiculous mess HP caused with their last updates to the custom ESXi extensions back in January/Febuary, HP released new updates to the HP CIM providers a few days ago.
This update is fixing the issue that was probably responsible for all of these woes: HP SmartArray RAID Conrollers displaying a random warning message.
From the release notes:
Version: 1.4.5 (15 Apr 2013) hp-esxi5.0uX-bundle-1.4.5-3.zip
Version: 9.3.5 (15 Apr 2013) hp-ams-esxi5.0-bundle-9.3.5-3.zip
- Smart Array Controller incorrectly reports Degraded status:Fixed issue where HP Insight Management WBEM Providers were incorrectly reporting a ‘Degraded’ status for the Smart Array Controllers. This caused VMware vSphere Management Console under Health Status category to display a ‘Warning’ Status (Yellow Exclamation) for the HP Smart Array Controllers. HP System Insight Manager and Insight Control for vCenter and other users of the HP Insight Management WBEM Providers would also report the Smart Array Controller as ‘Degraded’.
- Smart Array physical drive incorrectly reports OK status: Fixed issue where HP Insight Management WBEM Providers were not correctly reporting a not-OK status for a removed drive connected to a Smart Array Controller. This caused HP System Insight Manager, VMware vSphere Management Console, Insight Control for vCenter and other users of the HP Insight Management WBEM Providers to report the physical drive as OK after it was removed.
- Server reports incorrect processor Model: Fixed issue where HP Insight Management WBEM Providers would report an incorrect Processor Model for ProLiant servers and blades. For example: “Intel(R) Family: Intel(R) Xeon(TM) 2.5GHz (x86 Family 179 Model 125 Stepping 7)”” is reported instead of the correct “Intel(R) Family: Intel(R) Xeon(TM) 2.5GHz (x86 Family 179 Model 45 Stepping 7

Note that the HP vibsdepot metadata was not updated with this bundle yet (good job again, HP), so this update won’t appear in Update Manager automatically if you set it up to connect to the vibsdepot. You can still import the bundle manually in Update Manager or install it directly onto a host though.

[Update 26.04.2013]

With the release of ESXi 5.1 Update 1 HP also now provides a full repository release including the Utility Bundle and NMI driver (which actually do not contain any updated VIBs):
http://vibsdepot.hp.com/hpq/apr2013/

But HP still hasn’t updated the main metadata XML so this won’t show up in UM.
Currently, if you want these updates in UM automatically, you need to configure this additional Download Source URL:
http://vibsdepot.hp.com/hpq/apr2013/index.xml

To continue the confusing version management, the Complete Bundle is now at version “04.25.13″.

[/Update]

Changes in the HP ESXi offline bundle aka HP CIM providers aka hardware management extensions:
# esxcli software sources vib list -d /tmp/hp-esxi5.0uX-bundle-1.4.5-3.zip
Name             Version                          Vendor           Creation Date  Acceptance Level  Status
---------------  -------------------------------  ---------------  -------------  ----------------  ---------
hp-ams           500.9.3.5-02.434156              Hewlett-Packard  2013-02-21     PartnerSupported  Update
hp-smx-provider  500.03.02.10.4-434156            Hewlett-Packard  2013-03-25     VMwareAccepted    Update
char-hpcru       5.0.3.09-1OEM.500.0.0.434156     Hewlett-Packard  2012-08-30     PartnerSupported  Installed
char-hpilo       500.9.0.0.9-1OEM.500.0.0.434156  Hewlett-Packard  2011-10-07     PartnerSupported  Installed

Changes in the HP Agentless Monitoring Service (AMS) bundle (Gen8 servers):
# esxcli software sources vib list -d /tmp/hp-ams-esxi5.0-bundle-9.3.5-3.zip
Name            Version                          Vendor           Creation Date  Acceptance Level  Status
--------------  -------------------------------  ---------------  -------------  ----------------  ---------
hp-ams          500.9.3.5-02.434156              Hewlett-Packard  2013-02-21     PartnerSupported  Update
char-hpcru      5.0.3.09-1OEM.500.0.0.434156     Hewlett-Packard  2012-08-30     PartnerSupported  Installed
char-hpilo      500.9.0.0.9-1OEM.500.0.0.434156  Hewlett-Packard  2011-10-07     PartnerSupported  Installed
hp-smx-limited  500.03.02.10.3-434156            Hewlett-Packard  2013-03-25     VMwareAccepted    Update

Changes in the HP NMI driver bundle (none, since a very long time):
# esxcli software sources vib list -d /tmp/hp-nmi-esxi5.0-bundle-2.1-2.zip
Name   Version        Vendor  Creation Date  Acceptance Level  Status
-----  -------------  ------  -------------  ----------------  ---------
hpnmi  2.0.11-434156  hp      2011-07-29     PartnerSupported  Installed

Changes in the HP Utilities bundle (none compared to the February release):
# esxcli software sources vib list -d /tmp/hp-HPUtil-esxi5.0-bundle-1.4-15.zip
Name        Version                          Vendor           Creation Date  Acceptance Level  Status
----------  -------------------------------  ---------------  -------------  ----------------  ---------
hponcfg     04-00.10                         Hewlett-Packard  2011-11-13     PartnerSupported  Installed
hpacucli    9.40-12.0                        Hewlett-Packard  2012-12-13     PartnerSupported  Installed
char-hpilo  500.9.0.0.9-1OEM.500.0.0.434156  Hewlett-Packard  2011-10-07     PartnerSupported  Installed
hpbootcfg   01-01.02                         Hewlett-Packard  2012-04-05     PartnerSupported  Installed

Neither the HP ESXi Utilities Offline Bundle nor the NMI Sourcing Driver received updates ~~and a new custom ESXi image with this bundle was not provided (as of now).~~
Update: With the release of ESXi 5.1 U1 the latest bundles are part of a new custom HP ISO.
There are no new relevant Proliant firmware releases either.

Just on a side note:
HP Emulex be2net based NICs got another firmware Update in last month too:
Version 4.1.450.1707 (25 Mar 2013) OneConnect-Flash-4.1.450.1707.iso
1. Fixed VMware PSOD seen on AMD based servers with BE2 based cards (24508)
2. Fixed FW memory leak with multiple logins to a redirected target (31144)

↧

ESXi 5.1 Update 1 and vCenter 5.1 Update 1 released – grab your fixes

April 26, 2013, 3:34 am

≫ Next: HP Virtual Connect Firmware 4.01 released

≪ Previous: April HP ESXi bundle update fixes SmartArray warning

The long anticipated first major Update bundle of ESXi and vCenter 5.1 has finally been released. Download them at the usual place.
The insane list of fixes confirms my gut feeling again that unfortunately, many VMware products only start getting usable after the first (or sometimes even the second) Update bundle. (Remember vCenter 5.1a and 5.1b or the loads of support alerts?)

ESXi 5.1 Update 1

Go and check the release notes. Seriously.
No real new features or enhancements have been added apart from a few new supported Guest OSes.
But huge loads of important issues and bugs have been fixed, a few of which were anticipated since a long time. Here are some excerpts to highlight some of the important or interesting fixes:
https://www.vmware.com/support/vsphere5/doc/vsphere-esxi-51u1-release-notes.html#resolvedissues

ESXi 5.x host appears disconnected in vCenter Server and logs the ramdisk (root) is full message in the vpxa.log file
If Simple Network Management Protocol (SNMP) is unable to handle the number of SNMP trap files (.trp) in the /var/spool/snmp folder of ESXi, the host might appear as disconnected in vCenter Server. You might not be able to perform any task on the host.

Use of the invoke-vmscript command displays an error
When you use the invoke-vmscript PowerCLI command scripts on a virtual machine, the script fails with the following error message:
The guest operations agent could not be contacted.
– Mentioned this issue here –

ESXi hosts might fail with a purple diagnostic screen when you attempt to plug in or unplug a keyboard or mouse through a USB port
When you attempt to plug in or unplug a keyboard or a mouse through the USB port, the ESXi host might fail with the following error message:
PCPU## locked up. Failed to ack TLB invalidate.

Component-based logging and advanced configurations added to hostd log level
To avoid difficulties in getting appropriate logs during an issue, this release introduces component-based logging by dividing the loggers into different groups and prefixing them. Also, new advanced configuration allows you to change hostd log’s log level without restarting.

ESXi hosts might fail if hostd-worker thread consumes 100% CPU resources
Under sufficiently high workload on the ESXi host, hostd-worker thread might get stuck consuming 100% CPU while fetching the virtual machine screenshot file for vCloud Director UI. This issue might result in the failure of the ESXi host.

Long running vMotion operations might result in unicast flooding
When using the multiple-NIC vMotion feature with vSphere 5, if vMotion operations continue for a long time, unicast flooding is observed on all interfaces of the physical switch. If the vMotion takes longer than the ageing time that is set for MAC address tables, the source and destination host start receiving high amounts of network traffic.

ESXi host stops responding with a purple diagnostic screen during arpresolve
The ESXi host might stop responding during arpresolve and display a purple diagnostic screen
— ARP is serious business. Is that some single-frame layer 2 DoS vector? –

Network connectivity on IPv6 virtual machines not working with VMXNET3
When more than 32 IPv6 addresses are configured on a VMXNET3 interface, the unicast and multicast connectivity to some of those addresses are lost.

Virtual machine might lose network connectivity from external environment after vMotion with vNetwork Distributed Switch environment
A virtual machine might lose network connectivity from the external environment after vMotion with vNetwork Distributed Switch environment.

Attempts to apply host profile might fail with an error message indicating that the CIM indication subscription cannot be deleted

Hardware Status tab might stop displaying host health status
On an ESXi 5.1 host, Small-Footprint CIM Broker daemon (sfcbd) might fail frequently and display CIM errors. As a result, Hardware Status tab might stop displaying host health status and syslog.log might have error message similar to the following:
Timeout (or other socket error) sending request to provider.

Unable to delete files from the VMFS directory after one or more files are moved to it
After moving one or more files into a directory, an attempt to delete the directory or any of the files in directory might fail.
Accessing corrupted metadata on VMFS3 volume might result in ESXi host failure
If a file’s metadata is corrupted on a VMFS3 volume, ESXi host might fail with a purple diagnostic screen while trying to access the file. VMFS file corruption is extremely rare but might be caused by external storage issues.

Adding new ESXi host to a High Availability cluster and subsequently reconfiguring the cluster might result in the failure of any other host in the cluster with purple diagnostic screen
When a new ESXi host is added to a High Availability (HA) cluster and the HA cluster is subsequently reconfigured, any other host in the existing HA cluster might fail with a purple diagnostic screen

When the quiesced snapshot operation fails the redo logs are not consolidated
When you attempt to take a quiesced snapshot of a virtual machine, if the snapshot operation fails towards the end of its completion, the redo logs created as part of the snapshot are not consolidated. The redo logs might consume a lot of datastore space.

iSCSI LUNs do not come back online after recovering from the APD state
After recovering from the All-Paths-Down (APD) state, iSCSI LUNs do not come up until a host reboot. This issue occurs on Broadcom iSCSI offload-enabled adapters configured for iSCSI.

ESXi host might fail with a purple diagnostic screen if you run the vmware-vimdump command from DCUI
When you run the vmware-vimdump command from Direct Console User Interface (DCUI), the ESXi host might fail with a purple diagnostic screen. This might also result in missed heartbeat messages. This issue does not occur when the command is run by connecting through an SSH console.
– What. –

Reinstallation of ESXi 5.1 does not remove the Datastore label of the local VMFS of an earlier installation
Reinstallation of ESXi 5.1 with an existing local VMFS volume retains the Datastore label even after the user chooses the overwrite datastore option to overwrite the VMFS volume.

Microsoft Windows Deployment Services (WDS) might fail to PXE boot virtual machines that use the VMXNET3 network adapter
Attempts to PXE boot virtual machines that use the VMXNET3 network adapter by using the Microsoft Windows Deployment Services (WDS) might fail with

resxtop fails when upgraded from vSphere 5.0 to vSphere 5.1
In vSphere 5.1, SSL certification checks are set to ON. This might cause resxtop to fail in connecting to hosts and displays an exception message similar the following:
HTTPS_CA_FILE or HTTPS_CA_DIR not set.

VMRC and vSphere Client might stop responding when connected to a failed virtual machine
On an ESXi 5.1 host, VMware Remote Console (VMRC) and vSphere Client might stop responding when connected to a failed virtual machine or virtual machine with failed VMware Tools.

Time synchronization with the ESXi server might result in an unexpected reboot of the guest operating system when an ESXi host is configured as an NTP server
When an ESXi host is configured as an Network Time Protocol (NTP) server, the guest operating system might unexpectedly reboot during time synchronization with the ESXi host. This issue occurs when the virtual machine monitoring sensitivity level is set to High on a High Availability cluster and das.iostatsInterval option is set to False.

VMware Tools might fail while taking a quiesced snapshot of a virtual machine
If non-executable files are present in the backupScripts.d folder, VMware Tools might fail while taking a quiesced snapshot of a virtual machine.

After VMware Tools installation the guest operating system name changes from Microsoft Windows Server 2012 (64-bit) to Microsoft Windows 8 (64-bit)
After you create Microsoft Windows Server 2012 (64-bit) virtual machines and install VMware Tools, the guest operating system name changes from Microsoft Windows Server 2012 (64-bit) to Microsoft Windows 8 (64-bit).

VMware Tools might leak memory in Linux guest operating system
When multiple VLANs are configured for network interface in Linux guest operating system, VMware Tools might leak memory.

On an ESX/ESXi host earlier than version 5.1, upgrading only VMware Tools to version 5.1 results in a warning message
On an ESX/ESXi host earlier than version 5.1 and with a virtual machine running Windows guest operating system, if you upgrade only VMware Tools to version 5.1, a warning message similar to the following might be displayed in Windows Event Viewer:
[ warning] [vmusr:vmusr] vmware::tools::UnityPBRPCServer::Start: Failed to register with the host!

Attempts to install VMware Tools might fail with Linux kernel version 3.7
VMware Tools drivers are not compiled as the VMware Tools installation scripts are unable to identify the new kernel header path with Linux kernel version 3.7. This might cause VMware Tools installation to fail.

Customization of guest operating system might fail when deployed from some non-English versions of Windows guest operating system templates
Customization of guest operating system might fail when deployed from some non-English versions of Windows guest operating systems templates, such as the French version of Microsoft Windows 7, the Russian version of Microsoft Windows 7 and the French version of Microsoft Windows Server 2008 R2 guest operating systems. This issue occurs when the VMware Tools service vmtoolsd.exe fails.

Virtual machines with vShield Endpoint Thin Agent might encounter performance-related problems when you copy network files to or from a CIFS share
You might encounter performance-related problems with virtual machines while copying network files to or from a Common Internet File System (CIFS) share.
This issue occurs when virtual machines running vShield Endpoint Thin Agent available from the VMware Tools bundle are used.

Be aware of the lengthy list of still present known issues too:
https://www.vmware.com/support/vsphere5/doc/vsphere-esxi-51u1-release-notes.html#knownissues

Contrary to what’s stated in the release notes, the HCL does not display information for ESXi 5.1 Update 1 yet. But the Product Interoperability Matrix has been updated.
It shows that the 5.1 U1 VMware Tools are supposedly not supported for 5.0 hosts, which is odd since they are still listed as supported on 4.1 U3 and 4.0 U4. The 5.1 U0 Tools were fully supported on 5.0 hosts too. The information is probably just not updated properly yet.

vCenter Server 5.1 Update 1

Go and check the release notes. Seriously.
A couple of new enhancements have been added with Update 1, namely:

vCenter Server is now supported on Windows Server 2012

vCenter Server now supports the following databases:
    Microsoft SQL Server 2012
    Microsoft SQL Server 2008 R2 SP2

vCenter Server now supports customization of the following guest operating systems:
    Windows 8
    Windows Server 2012
    Ubuntu12.04
    RHEL 5.9
(Actually, Windows 8 and 2012 were supported previously too – though there was a bug with the customization process)

vCenter Essentials no longer enforces vRAM usage limit of 192 GB
With vSphere 5.1 Update 1, the Essentials and Essentials Plus licenses no longer restrict virtual machine power-on operations when the vRAM usage limit of 192 GB is met

Like with ESXi 5.1 U1, there is a long list of fixed bugs and issues. A couple of serious security vulnerabilities, especially in the vCenter Appliance department have been fixed too. Here are a few taken from the release notes:
https://www.vmware.com/support/vsphere5/doc/vsphere-vcenter-server-51u1-release-notes.html#resolvedissues

Backup of the Inventory Service database fails
A backup operation of the Inventory Service database while the Inventory Service is running fails due to a bad_certificate error.
– Certificate issues in VMware land – I’m lovin’it. –

Unable to add ESXi 5.1 hosts to existing vSphere Distributed Switch versions 4.0, 4.1, and 5.0 in vCenter Server 5.1 with compatibility issue
When you upgrade from vCenter Server 4.0, 4.1, or 5.0 to vCenter Server 5.1, adding ESXi 5.1 hosts to the existing vSphere Distributed Switch (vDS) versions 4.0, 4.1, and 5.0 might fail. However, if you create new vDS switch versions 4.0, 4.1, 5.0 or 5.1 after upgrading to vCenter Server 5.1, you will be able to add ESXi 5.1 hosts.

vCenter Server when deployed in an environment that uses Active Directory (AD) with anonymous LDAP binding enabled doesn’t properly handle login credentials
In this environment, authenticating to vCenter Server with a valid user name and a blank password might be successful even if a non-blank password is required for the account.
The Common Vulnerabilities and Exposures project (cve.mitre.org) has assigned the name CVE-2013-3107 to this issue.

vCenter VAMI UI allows arbitrary code execution
The vCenter Server Appliance (vCSA) VAMI web interface contains a vulnerability that allows an authenticated remote attacker to upload files to an arbitrary location creating new files or overwriting existing files. Replacing certain files may result in a denial of service condition or code execution. In the default vCSA setup, authentication to vCSA is limited to root since root is the only defined user.

Storage operations such as cold migration, storage vMotion, and cloning of virtual machine fail at 99%
Storage operations, including cold migration, storage vMotion, and cloning of a virtual machine with IDE disks and change block tracking enabled might fail at 99% when you use vCenter Server 5.1 to manage ESX/ESXi 4.x hosts.
An error message similar to the following is displayed:
A general system error occurred: Configuration information is inaccessible.

Single Sign On (SSO) upgrade from vCenter Server 5.1 0a to vCenter Server 5.1 0b does not replace the sspiservice.exe and rsautil.cmd files
After you upgrade SSO from vCenter Server 5.1 0a to vCenter Server 5.1 0b, the sspiservice.exe and rsautil.cmd files are not replaced. When you run the rsautil -v command, the resulting version number is of vCenter Server 5.1 0a.

Unable to edit the settings of a virtual machine that is a member of a datastore cluster
In vCenter Server, you cannot edit the settings of a virtual machine that is a member of a datastore cluster by using a user account that does not have the permission to configure a datastore cluster in vCenter Server but has full permissions at the virtual machine level. The following error message is displayed:
Permission to perform this operation was denied.
You do not hold privilege “Datastore cluster > Configure a datastore cluster on Datastore cluster Cluster Name”

Unable to see CPU, memory, disk metrics on the cluster view in the performance chart
The CPU, memory, disk, and network metrics are absent from the cluster view under the performance option of the Advanced tab in vSphere Client.
– Whoa I always wondered where these stats were on the cluster level. –

vCenter Server becomes non responsive when validating permissions for unknown objects
When you move a host that has permissions on child entities or itself out of a cluster without removing permissions and also remove the host from the vCenter inventory, the vCenter Sever freezes and become non responsive whenever periodic validation takes place.

vCenter Server 5.1 service fails to start after you restart the server
The vCenter Server service fails to start if the Single Sign On service cannot connect to the database. This occurs when the Single Sign On service starts before the database service.

Updating the Base DN for groups when editing an identity source is not working correctly
In the Edit Identity Source dialogue for SSO, after you information for Base DN for groups such that it differs from that for the Base DN for users, save the changes, and return to the Edit Identity Source dialog box again, you notice that the Base DN for groups text box displays the same as the Base DN for users. This issue does not occur when you initially add the identity source.

Performance history for past year might contain only 30 days of information in vSphere 5.1
When you attempt to view the vSphere performance history for past year in the past year view tab, you can see only one month of performance history.

Logging in to vCenter Server through the vSphere Web Client fails if you specify a non-ASCII user name
If you provide a valid vCenter Server user name composed of non-ASCII characters and attempt to log in using the vSphere Web Client, the login attempt fails with the following error:
Provided credentials are not valid.

vSphere 5.1 Web Client advanced performance charts are slow to display
In the vCenter 5.1 web client, the advanced performance charts takes several minutes to load.
– Yeah, I really hope the dogslowness has been fixed. Maybe this fixes this annoying for input string error too?–

Attempt to install Red Hat Enterprise Linux using physical media on Windows XP fails when the installer is running
When you connect to any ESX/ESXi host through vSphere Client running on Windows XP, attempts to install Red Hat Enterprise Linux by using client-side physical media redirected from vSphere Client running on Windows XP fails when the installer is running.
– This specific issue made my day. –

Virtual Machine snapshot size (GB) and VM Total Size on Disk (GB) alarms are triggered incorrectly
In vCenter Server 5.1 you can configure the vCenter Server to send alarms when virtual machine size on disk exceeds a limit or when virtual machine snapshot size exceeds a limit. These alarms are falsely triggered when the virtual machine size or snapshot size are within the set limits. The alarm for virtual machine snapshot size might even be triggered when no snapshot exists.

Cloning a virtual machine through vSphere Client or vSphere Web Client causes the resulting virtual machine to have its disk pointing back to the source virtual machines disk
In vCenter Server, when you clone a virtual machine through vSphere Client or vSphere Web Client, you have the option to edit the hardware of the destination virtual machine. If you choose to edit a disk and adjust its size, the disk of the resulting virtual machine points back to the source virtual machine disk. This results in the destination virtual machine using the source virtual machine disk.

Virtual machines are unable to connect to the network after reverting to a snapshot
In vCenter Server, when a snapshot is reverted and virtual machine is powered on, Connect At Power On is selected on the network adapter but Connected is deselected. If you select the Connected check box an error similar to the following is displayed:
Invalid Configuration for device 0.

vSphere 5 Storage vMotion is unable to rename virtual machine files on completing migration
In vCenter Server , when you rename a virtual machine in the vSphere Client, the VMDK disks are not renamed following a successful Storage vMotion task. When you perform a Storage vMotion task for the virtual machine to have its folder and associated files renamed to match the new name, the virtual machine folder name changes, but the virtual machine file names do not change.
This issue is resolved in this release. To enable this renaming feature, you need to configure the advanced settings in vCenter Server and set the value of the provisioning.relocate.enableRename parameter to true.
– Remember having to set this manually. –

Unable to configure High Availability on vCenter Server 5.1 with only IPv6 networks
You cannot configure High Availability (HA) on vCenter Server 5.1 that uses only an IPv6 environment, as vCenter Server does not take IPv6 addresses (from default gateway or given by you through the vSphere Client) as isolation addresses.

Virtual machine power on fails with InsufficientFailoverResources if High Availability Admission Control is disabled
After all the hosts are inaccessible and only one host is left in a High Availability (HA) or Distributed Resource Scheduler (DRS) cluster, when you attempt to power on any more virtual machines even with the admission control is set to OFF, the following error message is displayed:
Insufficient resources

Check out the even longer list of currently known issues too:
https://www.vmware.com/support/vsphere5/doc/vsphere-vcenter-server-51u1-release-notes.html#knownissues

It completely flew under my radar until Dennis commented on it, but there is finally a VMware Update Manager Web Client Plugin available with this release. There are separate Release notes for Update Manager 5.1 Update 1 available which I didn’t notice at all and expected to be mentioned in the vCenter notes.

Miscellaneous updated stuff

ESXi HP extensions
Meanwhile, HP has also updated their April vibsdepot release and now provides a full repository release including the Utility Bundle and NMI driver: http://vibsdepot.hp.com/hpq/apr2013/
But they still haven’t updated the main metadata XML, so you still won’t get this in Update Manager automatically. Currently you need to configure this additional Download Source URL manually:

http://vibsdepot.hp.com/hpq/apr2013/index.xml

To continue the confusing version management, the “Complete Bundle” is now at version “04.25.13″.

A new customized HP ISO featuring ESXi 5.1 U1 as well as the April HP updates is also available already:
https://my.vmware.com/web/vmware/details?downloadGroup=HP-ESXI-5.1.0U1-GA-25APR2013&productId=327

vMA Update
Also, if you haven’t noticed until recently like me, VMware also released an update to the vSphere Management Assistant (vMA) a few weeks ago:
https://my.vmware.com/web/vmware/details?downloadGroup=VMA51P01&productId=327
Updating will be a major PITA because of course, there is no real upgrade path – you need to deploy a completely fresh Virtual Appliance from scratch again.

VDDK 5.1.1
Also of note, the VDDK has been updated too:
http://www.vmware.com/support/developer/vddk/vddk-511-releasenotes.html

The initial 5.1 release of the VDDK apparently contained some serious bugs, which made some major backup vendors not implement support 5.1 until now:
http://up2v.nl/2012/11/22/is-your-vmware-vsphere-5-1-third-party-backup-reliable-or-not/
http://www.symantec.com/connect/blogs/updated-quality-wins-every-time-vsphere-51-support
http://kb.vmware.com/kb/2039931

VMware Converter Standalone 5.1
He’s not dead yet, Jim. This totally caught me off guard but it’s awesome to see the free Converter finally getting the proper update it deserved:
http://www.vmware.com/support/converter/doc/conv_sa_51_rel_notes.html

Support for virtual machine hardware version 9
Guest operating system support for Microsoft Windows 8 and Microsoft Windows Server 2012
Guest operating system support for Red Hat Enterprise Linux 6
Support for virtual and physical machine sources with GUID Partition Table (GPT) disks
Support for virtual and physical machine sources with Unified Extensible Firmware Interface (UEFI)
Support for EXT4 file system

VMware vCloud Director 5.1.2
Here comes a small vCD update too:

Rights for creating, reverting, and removing snapshots: Rights for creating, reverting, and removing snapshots have been added, allowing system administrators to configure these rights for all roles.

Allocation pool organization virtual datacenters can be elastic or non-elastic: Starting with vCloud Director 5.1.2, system administrators can configure Allocation Pool organization virtual datacenters with Single Cluster Allocation Pool (SCAP), making them non-elastic. This is a global setting that affects all Allocation Pool organization virtual datacenters. By default, Allocation Pool organization virtual datacenters have Single Cluster Allocation Pool enabled. Systems upgraded from vCloud Director 5.1 that have Allocation Pool organization virtual dataceters with virtual machines spanning multiple resource pools have Single Cluster Allocation Pool disabled by default.

vCloud Director is now supported on Red Hat Enterprise Linux 6.3

Support for Microsoft SQL Server 2012: vCloud Director now supports Microsoft SQL Server 2012 databases.

Additional guest operating system customization support: vCloud Director now supports customization of the following guest operating systems:
Microsoft Windows Server 2012

The following software which is part of the vSphere suite received updates too:
- vCenter Orchestrator 5.1.1
– vSphere Replication 5.1.1
- vSphere Data Protection 5.1.1

The vSphere PowerCLI has not been updated, Release 2 from February is still the most recent version.

There is no Update for vShield/vCloud Networking and Security (apart of the vShield Endpoint driver shipped with the VMware Tools)

↧

HP Virtual Connect Firmware 4.01 released

June 17, 2013, 2:28 am

≫ Next: [Script] PowerCLI – find out the host on which VMs are running when vCenter is down

≪ Previous: ESXi 5.1 Update 1 and vCenter 5.1 Update 1 released – grab your fixes

After announcing it a while ago already, HP released Virtual Connect firmware 4.01 last week to the public:
http://www.hp.com/swpublishing/MTX-312edceee05e4f13a316c9c504
The release notes can be found here.
You may also want to check out the User and Installation documentation:
HP Virtual Connect for c-Class BladeSystem Setup and Installation Guide Version 4.01 and later
HP Virtual Connect for c-Class BladeSystem Version 4.01 User Guide
HP Virtual Connect Manager Command Line Interface for c-Class BladeSystem Version 4.01 User Guide

Note that according to the most recent HP VMware Recipe document from April, the recommended VC firmware version for vSphere/ESX(i) environments is still 3.75.

Virtual Connect 4.01 is supported on the following interconnect modules:
• HP VC Flex-10/10D Module
• HP VC FlexFabric 10Gb/24-port Module
• HP VC Flex-10 10Gb Enet Module
• HP VC 4Gb FC Module (409513-B22 – End of Life announced in December, 2011)
• HP VC 8Gb 20-Port FC Module
• HP VC 8Gb 24-Port FC Module
The HP 1/10Gb VC Enet Module (399593-B22) and the HP 1/10Gb-F VC Enet
Module (447047-B21) are no longer supported

What’s new?

As far as new features are concerned, Virtual Connect firmware 4.01 delivers the following:

FCoE enhancements like FIP snooping support
QoS traffic prioritization
Some not further explained “Minimum and maximum bandwidth optimization” feature which requires running FlexNICs with at least the firmware provided by Service Pack for ProLiant 2013.02.00
SNMP enhancements for LLDP MIB, Bridge MIB, Interface MIB, and Link aggregation MIB (update to the latest MIB Kit)
Explicit control of setting LACP timers
Custom roles and permissions in VC
IGMP snooping enhancements for multicast traffic
And a few other minor features such as VCM CLI TAB auto-completion, searching in the VCM GUI, setting session timeouts in the VCM CLI and GUI.

Nothing really that spectacular or exciting in my opinion here.

Apart from the above feature enhancements, there are of course also a couple of bug fixes provided with this update. An excerpt of the resolved issues section from the release notes:

During a firmware upgrade to the Virtual Connect environment with a large number of defined Ethernet networks and NIC teaming failover policies configured for ALB or TLB, network and SAN connectivity interruption sometimes occurred
Loss of Ethernet connectivity on the uplink port caused SmartLink operation on the FlexNIC to result in the inability of the FlexHBA FCoE connection on the same port to log into the SAN Fabric.
The SCSI command aborted under load with the HP VC 8Gb 24-Port FC Module
HP VC 8Gb 20-port FC modules could not operate at 8Gb reliably when connecting to Cisco Nexus 5xxx series switches
The Virtual Connect Manager CLI did not allow modification of an already existing Ethernet profile connection with an index of 100 or higher
DCC (Smart Link) stayed unavailable after server power on and online speed changes failed

Some known issues with this release:

Frames that are sent and received by server blades that are larger than 1518 do not increment the user visible Ethernet counter: rfc1757_StatsOversizePkts
A GUI user with Network-only privileges is allowed to add profile connections
The GUI copy profile does not correctly handle MAC/WWNs
LUN loss might occur when VC interacts with an HP LPe1205 HBA
If a VC domain is defined to use “VC-Defined” MAC addresses and “Factory-default” WWNs, then the VC GUI reports an incorrect WWN for the FCoE connection in the server profile
While configuring a Shared Uplink Set in failover mode using the GUI, the GUI incorrectly allows uplink ports that are designated as Primary ports to be deleted, which can render the Shared Uplink Set without any Primary uplink ports

The Virtual Connect Support Utility (VCSU) which you can use to update Virtual Connect modules in a controlled, zero-downtime manner has also been updated. You need the latest version 1.8.1 to update the VC firmware to 4.01.
http://www.hp.com/swpublishing/MTX-6cfcb4e34df146ee80f0a45b29

HP recommends updating Virtual Connect Enterprise Manager (VCEM) to the latest version 7.2. But more importantly, HP also recommends updating your blade servers with SPP version 2013.02.00 before updating to VC 4.01.

↧

[Script] PowerCLI – find out the host on which VMs are running when vCenter is down

July 11, 2013, 7:52 am

≫ Next: [Script] Perl – Check Point firewall logfile analysis – dropped connections statistics

≪ Previous: HP Virtual Connect Firmware 4.01 released

Many moons ago, when I started playing with the wonderfulness that is PowerCLI, one of the first things I wrote with a particular problem in mind was a small script to quickly locate the host running our vCenter server in case anything went wrong and I lost access to vCenter directly.
So instead of trying to connect to every possible host of the cluster manually with the vSphere Client, why not just connect to all of them via PowerCLI and query them quickly?

Where’s Waldo?

This resulted in the small, simple script posted below. For this script, you can provide either a list of hosts to connect to, an alias for a cluster which member hosts you pre-populated in the script, along with one or more search strings. This search is matched against the VM names and outputs the list of found VMs with their current power state and most importantly, the host running the VM. This way you can get a VM-Host mapping of not only your vCenter VM, but other VMs as well.

I remembered this script while reading a cool article on v-front.de about various other ways to keep track on which host your vCenter VM is running.
I “polished” the old, simple code a bit but yeah, I’m still pretty horrible when it comes to scripting. Anyways, here it is in case anyone finds it useful:

PARAM(
    $Hostlist,
    $Cluster,
    $Query
)
###Adjust the cluster match list accordingly
if ((($Hostlist -eq $null) -and ($Cluster -notmatch "^(PROD|TEST|DMZ)$")) -or ($Query -eq $null) ) {
    Write "Wrong parameters/syntax. Usage: `n-hostlist [host1,host2,...] `nOR`n-cluster [PROD|DMZ|TEST] `nAND`n -query [string1,string2,...]."
    Exit
}
###Populate the cluster list with your clusters and hostnames so you don't have to enter a list of hosts manually every time with the -Hostlist parameter
switch ($Cluster) { 
    "PROD"    { $Hostlist = "prodhost01.local","prodhost02.local","prodhost03.local","prodhost04.local" }
    "DMZ"    { $Hostlist = "dmzhost01.local","dmzhost02.local" }
    "TEST"    { $Hostlist = "testhost01.local","testhost02.local" }
}
Write "Provide local login credentials for the ESXi hosts:`n$HostList`n"
$HostCreds = Get-Credential root
Write "Connecting to $Hostlist"
Connect-VIServer -Server $Hostlist -Credential $HostCreds
if ($DefaultVIServer -eq $null) {
    Write "No host connected. Exiting"
    Exit
}
foreach ($q in $Query) {
    Write "`n----------`nSearching for VMs with names containing the string `"$q`"..."
    Get-VM -Name *$q* | Sort Name | Format-Table -autosize Name, Powerstate, VMHost | Out-String
}
Disconnect-VIServer -Server * -Force -Confirm:$false
Exit

Here’s a usage example:

C:\findvm.ps1 -cluster PROD -query vcenter,sql
Provide local login credentials for the ESXi hosts:
prodhost01.local prodhost02.local prodhost03.local prodhost04.local

Connecting to prodhost01.local prodhost02.local prodhost03.local prodhost04.local

Name                           Port  User
----                           ----  ----
prodhost01.local           443   root
prodhost02.local           443   root
prodhost03.local           443   root
prodhost04.local           443   root

----------
Searching for VMs with names containing the string "vcenter"...

Name                PowerState VMHost
----                ---------- ------
SRV88_vCenter  PoweredOn prodhost02.local 
SRV99_vCenter  PoweredOn prodhost03.local 

----------
Searching for VMs with names containing the string "sql"...

Name                   PowerState VMHost
----                   ---------- ------
SRV77_SQL-VMware  PoweredOn prodhost01.local

↧

[Script] Perl – Check Point firewall logfile analysis – dropped connections statistics

July 30, 2013, 6:48 am

≫ Next: [Script] Perl – Check Point firewall logfile analysis – rule usage

≪ Previous: [Script] PowerCLI – find out the host on which VMs are running when vCenter is down

Here’s a simple perl script I wrote some time ago in order to analyze Check Point firewall logs for dropped connections, outputting a simple statistic by drops per source-IP. It also displays the number of accepted connections per source-IP.

You first need to convert the Check Point binary logs to text logfiles via fwm logexport like:
fwm logexport -n -p -i $FWDIR/log/2013-07-28_000000.log -o /var/tmp/2013-07-28.txt
Always use the -n switch btw or you can grab quite a few snickers waiting for DNS reverse resolutions in large logfiles. If you’re running this directly on a check point SPLAT or Gaia node, make sure you have enough space on the destination volume since the exported text logs can be quite large (use /var/tmp instead of /tmp)

Of course this script will only be able to gather statistics of firewall rules you’ve actually set to logging.

Code:

#!/usr/bin/env perl
###
#Export logs via fwm logexport:
# fwm logexport -n -p -i $FWDIR/log/2013-07-28_000000.log -o /tmp/2013-07-28.txt
#Feed these exported textfiles as arguments to this script:
# ./fwdrops.pl /tmp/2013-07-28.txt /tmp/2013-07-27.txt
###
use strict;
use warnings;

my (%sourceaccept, %sourcedrop, %indexes);
my @want = ("src", "action");

foreach my $file (@ARGV) {
  open my $fh, '<', $file or die "Can't open file $!";
  my @fileheader = split (";", <$fh>);
  foreach my $cur (@want) {
        $indexes{$cur} = 0;
        ++$indexes{$cur} until $fileheader[$indexes{$cur}] eq $cur;
  }
  <$fh>; #filter "Log file has been switched to..." message

  while(<$fh>) {
    my @vals = split (";", $_);
    ++$sourcedrop{$vals[$indexes{"src"}]} if ($vals[$indexes{"action"}] eq "drop");
    ++$sourceaccept{$vals[$indexes{"src"}]} if ($vals[$indexes{"action"}] eq "accept")
  }
  close $fh;
}

printf ("\n\n%-20s\t%-20s\t%-20s\n", "source-IP", "Dropped Connections", "Accepted Connections");
foreach (sort { $sourcedrop{$b} <=> $sourcedrop{$a} }  keys %sourcedrop) {
  $sourceaccept{$_} = 0 unless $sourceaccept{$_}; #possible undef values for 0 accepted connections
  printf ("%-20s\t%-20d\t%-20d\n", $_, $sourcedrop{$_}, $sourceaccept{$_});
}

Example output:

source-IP           	Dropped Connections 	Accepted Connections
123.151.42.61       	5054                	0                    
188.165.95.172      	2667                	0                     
64.31.20.210        	1542                	0                   
74.63.232.92        	1536                	0                   
80.82.65.213        	1513                	0                   
122.227.228.107     	1478                	0                   
131.188.3.220       	1340                	0                   
134.76.10.46        	1334                	4                   
54.230.14.2         	1104                	0                   
50.97.107.45        	1023                	0                   
94.126.65.213       	774                 	0                              
58.221.60.179       	771                 	0                   
88.198.39.205       	768                 	0                   
66.154.119.161      	768                 	0            	
118.100.218.9       	768                 	0                   
183.248.145.108     	768                 	0                   
218.61.0.26         	768                 	0                   
180.106.43.26       	768                 	0                   
142.4.103.222       	768                 	0          
[.....]

↧

[Script] Perl – Check Point firewall logfile analysis – rule usage

July 30, 2013, 8:46 am

≫ Next: Configuring and securing local ESXi users for hardware monitoring via WBEM

≪ Previous: [Script] Perl – Check Point firewall logfile analysis – dropped connections statistics

Continuing from my previous post, here’s another quick and dirty perl script I used some time ago to provide a basic analysis of Check Point firewall logfiles in terms of rule usage.

It kind of lost the the bit of usefulness it had with the rule base hit counter that was introduced in R75.40, but maybe someone can still make use of this horrible code. Or some better examples like this to begin with.
The script here also includes info on implicit rules, address spoofing, whacky ICMP packets or basically any stuff that isn’t logged with an actual rule name. though.

Again this script will obviusly only be able to gather statistics of firewall rules you’ve actually set to logging.

Code:

#!/usr/bin/env perl
###
#Export logs via fwm logexport:
# fwm logexport -n -p -i $FWDIR/log/2013-07-28_000000.log -o /tmp/2013-07-28.txt
#Feed these exported textfiles as arguments to this script:
# ./fwrules.pl /tmp/2013-07-28.txt /tmp/2013-07-27.txt
###

use strict;
use warnings;

my (%rulenames, %indexes);
my @want = ("rule_name", "rule", "message_info", "TCP packet out of state", "type");

foreach my $file (@ARGV) {
  open my $fh, '<', $file or die "Can't open file $!";
  my @fileheader = split (";", <$fh>);
  foreach my $cur (@want) {
        $indexes{$cur} = 0;
        ++$indexes{$cur} until $fileheader[$indexes{$cur}] eq $cur;
  }   
  <$fh>; #filter "Log file has been switched to..." message

  while(<$fh>) {
    my @vals = split (";", $_);
    if($vals[$indexes{"type"}] ne "control") {
      ++$rulenames{"$vals[$indexes{\"rule\"}] - $vals[$indexes{\"rule_name\"}]"} if $vals[$indexes{"rule_name"}];
      ++$rulenames{"**TCP packet out of state**"} if $vals[$indexes{"TCP packet out of state"}];
      ++$rulenames{"**[No Rule Name]**"} unless ($vals[$indexes{"rule_name"}] || $vals[$indexes{"message_info"}] || $vals[$indexes{"TCP packet out of state"}]);
      ++$rulenames{"**$vals[$indexes{\"message_info\"}]**"} if $vals[$indexes{"message_info"}];
    }
  }
  close $fh;
}

printf ("\n\n%-40s\t%-10s\n", "Rule Name", "Hits");
foreach (sort { $rulenames{$b} <=> $rulenames{$a} }  keys %rulenames) {
  printf ("%-40s\t%-10d\n", $_, $rulenames{$_});
}

Example output:

Rule Name                                       Hits
23 - Web Access Internal			1560537
13 - DMZ Access					310385
11 - DNS Queries				275722
52 - Access to Department B			240914
104 - Defualt Drop                              117447
52 - Access to Department A			103039
[...]
**TCP packet out of state**                     42895
[...]
**Address spoofing**                            11514
[...]
**[No Rule Name]**                              881
**ICMP error does not match an existing connection**    678
**Implied rule**                                320
**SSH version 1.x is not allowed**              27
**Invalid TCP packet - source / destination port 0. Dropped although the protection is disabled**   26
**Invalid ICMP-error header length**            2

↧

Configuring and securing local ESXi users for hardware monitoring via WBEM

September 27, 2013, 5:12 am

≫ Next: Storage performance testing: Not all IOPS are created equal

≪ Previous: [Script] Perl – Check Point firewall logfile analysis – rule usage

Besides good ol’ SNMP, the open Common Information Model (CIM) interface on an ESXi host provides a useful way of remotely monitoring the hardware health of your hosts via the Web-Based Enterprise Management (WBEM) protocol. Pretty much every major hardware management solution and agent today supports using WBEM to monitor hosts of various OSes.
Unlike SNMP (except for the painful to implement version 3), it builds on a standard HTTP(S) API, allowing secure SSL/TLS protected authentication and communication between the host and the management stations. Of course you can also use SNMP and WBEM independently at the same time too.
On ESXi, the CIM interface to is implemented through the open Small Footprint CIM Broker (SFCB) service.

Seems great, right? To manage your hosts via CIM/WBEM with for example the HP Systems Insight Management (SIM) pictured above, you just need to provide a local user on the ESXi host which SIM can use to authenticate against the host.
You can use the standard root user for example, but is that a good idea? I certainly disagree about that, even more so in environments of administrative disparity where you still have strict separation of virtualization admins and hardware admins (I agree this separation makes no sense in this day and age and causes all sorts of problems besides just this one, but this is the daily reality I’m facing).

So in the end, we’re down to creating dedicated local users on each of our ESXi hosts. Now here’s the catch:
It requires us to create the new local user and assign this user the Administrator role with all permissions on the root level of the ESXi host, just so the user can query some hardware management information through WBEM.
Bug 1: There is a local permission called “Host – CIM – CIM Interaction”, which implies that using a dedicated role for this user instead of the builtin Administrator role would work, but forget about that, it just doesn’t (related to bug 2 below).

How can I check if the WBEM interface works with a user?

Even without having access to a WBEM capable management system like HP SIM, you can easily check yourself if a user is allowed to query CIM providers by connecting to the WBEM port like this:

# curl -ik 'https://myesxihost.domain:5989' --request POST --data "" --basic --user cimuser
  Enter host password for user 'cimuser':
  HTTP/1.1 200 OK
  Content-Type: application/xml; charset="utf-8"
  Content-Length: 0
  Cache-Control: no-cache
  CIMOperation: MethodResponse

As you can see the service replies with a good 200 OK message.
Wrong passwords or insufficient permissions will receive a 401 Unauthorized response:

# curl -ik 'https://myesxihost.domain:5989' --request POST --data "" --basic --user cimuser
  Enter host password for user 'cimuser':
  HTTP/1.1 401 Unauthorized
  WWW-Authenticate: Basic realm="cimom"
  Server: sfcHttpd
  Content-Length: 0

You can also reproduce these requests from the local ESXi shell directly in case you don’t have a system with curl at hand:

# openssl s_client -connect localhost:5989
 POST / HTTP/1.1
 Host: myesxihost.domain:5989
 Authorization: Basic [username:password in base64 goes here]
 Content-Length: 0

Denied login attempts are logged to /var/log/syslog.log:
2012-12-05T11:46:06Z sfcb-CIMXML-Processor[1192236]: pam_access(sfcb:auth): access denied for user `cimuser’ from `sfcb’

Access to the CIM interface via WBEM is governed through the configuration files in /etc/security/. Here we have the default /etc/security/access.conf after only creating the local cimuser account (no permissions assigned etc):

# This file is autogenerated and must not be edited.
  +:dcui:ALL
  +:root:ALL
  +:vi-admin01:ALL
  +:vpxuser:ALL
  +:vslauser:ALL
  -:vi-user01:ALL
  -:ALL:ALL

The cimuser is not included and as such treated with no rights (-:ALL:ALL). If we assign the builtin Administrator ESXi role, a new entry will be added allowing everything:
+:cimuser:ALL

Bug 2: For non-root local users, access to CIM interface is only allowed if we add permissions with the builtin Administrator role on the ESXi host. Even cloning that role and thereby retaining full permissions results in NOT being allowed to access the CIM interface. This is because any non-builtin Administrator assignment will not be reflected in /etc/security/access.conf.

So now that we know why it doesn’t work unless the user has the builtin Administrator role, we could edit the /etc/security/access.conf file manually and even control that the user only has access to the sfcb CIM service by adding the following before the -ALL rule:
+:cimuser:sfcb

This works well and allows us to assign really the most minimal permissions possible, but the problem is that this file is auto-generated at every reboot (running the auto backup scripts is not a solution because of that).

Can we do better than that?

Yes we can, kind of…
We can work around the problem above with the following steps:

1. Add the cimuser to the root group in /etc/group on the host:
# grep root /etc/group
root:x:0:root,cimuser

Note that local groups were officially deprecated as of ESXi 5.1 and you can’t administer them with the vSphere client anymore.

2. For good measure, also change the login shell of the cimuser from /bin/sh to /sbin/nologin, or otherwise they can connect via SSH or login through the local shell:
# grep cimuser /etc/passwd
cimuser:x:501:0:Hardware Monitoring:/:/sbin/nologin

Now you might chime in and say “Hey, I can do that from the vSphere Client too!” And I have to retort: “Unfortunately, no you can’t”:

Bug 3: Since ESXi 5.1 the “grant shell access to user” checkbox in the user properties became useless. Even if you uncheck it, no actual change will be performed and it will automatically be checked again. By editing the /etc/passwd file for the user directly, the checkbox will correctly display an unchecked status though.

3. Finally, run the ESXi config backup scripts so the changes persist across reboots:
# backup.sh 0
Saving current state in /bootbank
Clock updated.
Time: 08:06:53 Date: 09/27/2013 UTC
# /sbin/auto-backup.sh

That’s it, done.

This approach survives reboots or updates and does not require ANY assigned permissions for the user on the local ESXi host. You only need to create the user beforehand. The user will not be able to connect to the host with the vSphere Client, APIs like Powershell, or login through SSH or the local ESXi shell:

In PowerCLI:
Connect-VIServer myesxihost.local
 Connect-VIServer : 27.09.2013 10:23:08    Connect-VIServer        Permission to perform this operation was denied.
 In Zeile:1 Zeichen:1
 + Connect-VIServer myesxihost.local
 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 + CategoryInfo          : NotSpecified: (:) [Connect-VIServer], NoPermission
 + FullyQualifiedErrorId : Client20_ConnectivityServiceImpl_Reconnect_Exception,VMware.VimAutomation.ViCore.Cmdlets.Commands.ConnectVIServer

Via SSH:
# ssh cimuser@myesxihost.local
 Password:
 The time and date of this login have been sent to the system logs.
VMware offers supported, powerful system administration tools.  Please
 see www.vmware.com/go/sysadmintools for details.
The ESXi Shell can be disabled by an administrative user. See the
 vSphere Security documentation for more information.
 Login disabled
 Connection to myesxihost.local closed.

However, the user will still be able to login to the DCUI. But the scope of what he could do is quite limited compared to what he could do with full admin rights on the vSphere host components before (ok, he can still reboot the host which he probably can do already anyways if he has ILO or physical access, but he can’t delete files/VMs, reconfigure stuff except the management network etc (another thing for VMware to fix: he can do that even with no permissions), obvious things that should pretty quickly raise a few eyebros of the vSphere admins in charge).
So as long as you (I hope you do so either way already) secure your out of band management network and physical access to the hosts, this should be ok. Please let me know with a comment if you’re aware of a halfway supportable method to also disable DCUI access.

Wrapping it up

I’m aware that fiddling with local config files on the shell isn’t the smoothest solution, but it seems VMware does not intend or is taking their sweet time again to fix the actual problems. To my knowledge, this is currently the only way to implement a least privilege approach (which, surprise surprise, is also recommended in the official VMware Security Hardening Guides) for CIM based hardware monitoring.

Note that for all of this I already filed a ticket with VMware support almost a year ago. But nothing ever came of it.
In the end I was supposed to file a “feature request” for a bunch of bugs, which I did, but good good luck with that. (Aka “Yeah thanks for it, we’re stacking it right onto our clipboard trays which coincidentally is located right next to the shredder.“)

However, it should also be noted that VMware provides another method to authenticate to the CIM service on ESXi hosts, that is through CIM tickets issues by the vCenter server. The method is described in this document but has a security related catch again:

A CIM client must authenticate before it can access data or perform operations on a VMware® ESXi™ host.The client can authenticate in one of the following ways.
- The client can authenticate directly with the CIMOM on the ESXi host by supplying a valid user name andpassword for an account that is defined on the ESXi host.
- The client can authenticate with a sessionId that the CIMOM accepts in place of the user name andpassword. The sessionId (called a “ticket”) can be obtained by invoking the AcquireCimServicesTicket() method on VMware vCenter™ Server.
VMware recommends using CIM ticket authentication for servers managed by vCenter. If the ESXi host is operating in lockdown mode, the CIMOM does not accept new authentication requests from CIM clients.
However, the CIMOM will continue to accept a valid ticket obtained from vCenter Server. The ticket must be obtained using the credentials of any user that has administrative privileges on vCenter Server.

In our case that would again breach separation of administrative roles.
I’m also not sure if this method is widely supported by management applications such as HP SIM, information about this is very scarce on the web. Please leave a reply if you happen to know more about it or have any other input on this whole topic.

↧

Storage performance testing: Not all IOPS are created equal

January 21, 2014, 8:49 am

≫ Next: Forefront TMG Log Export with MSDEToText.vbs messing up IPs

≪ Previous: Configuring and securing local ESXi users for hardware monitoring via WBEM

Every once in a while I come across postings of astonishingly awesome IOPS numbers achieved with a relatively moderate setup. Often this is due to the fact that the benchmark used ran a “maximum throughput” IO pattern, which doesn’t say a lot about actual storage performance. This is because these kinds of patterns issue idealized, sequential IO (often with small IO sizes) to measure the maximum, theoretical throughput of the storage subsystem. Unfortunately this kind of IO pattern practically never occurs with real world applications.

An IO stream hitting a storage device:

Is Random to a certain degree
Has a certain read/write ratio
Can have variable IO sizes between 512 Byte and several MiB (though that is hardly real-world relevant, up to 128KiB is normal)
Is stuffed into queues at various levels in the storage stack with various queue depths

All these factors have a huge influence on what everyone casually and generalizing calls “IO Operations per Second”, or IOPS. The quite basic rule of thumb is:
The smaller the IO size and the more sequential the workload, the higher the IOPS number will be.
It should be noted that randomness of IO does not affect flash-based storage like SSDs, but impacts traditional spinning disks to a very large degree. Also, taken a single (spinning) disk, write IOs are not necessarily more expensive than read IOs, considering the local cache on every disk has is a pure read-cache (and don’t you dare to enable write-caching on the disks themselves). Of course this is where RAID-penalties come in, but I’m trying to focus on IO itself regardless of physical storage properties in this article.

When vendors claim their system or disk delivers (up to, hehe) [insert arbitrary number here] IOPS, they usually refer to a workload that is completely random, mainly consists of writes and has a 4-32KiB IO block size.
If you conduct your own storage IO testing, you should use similar IO patterns if you want to compare real-world storage
performance. Otherwise you may end up with unrealistic huge numbers like in the maximum throughput cases.

A great resource for comparing storage IO performance is this thread on the VMware community forums, where hundreds of results using a common IOmeter configuration are posted.
You can get the IOmeter config file here and also paste your resulting csv files there for summarizing numbers.

There is also the VMware IO analyzer virtual appliance, providing various pre-defined IO pattern configs for applications like Exchange or SQL server, but last time I tested it behaved a bit “funky”.

Last but not least, a tip if you want to analyze just what kind of IO workload your specific VM/application is issuing: vscsistats is your friend and provides tremendous in-depth information on this.

Further recommended reading:
http://www.symantec.com/connect/articles/getting-hang-iops-v13

↧

Forefront TMG Log Export with MSDEToText.vbs messing up IPs

March 21, 2014, 7:08 am

≫ Next: openssl heartbleed attack – The cryptocalypitc judgement day has arrived

≪ Previous: Storage performance testing: Not all IOPS are created equal

Logging Firewall or Web Proxy traffic on a Forefront TMG/ISA node into the local SQL Express-based database (which is the default setting) has a few advantages, like being able to query past logs through the TMG console. But sometimes it’s better to have logs stored in a plain text format as well for a 3rd party tool or your own log analysis scripts.

For this purpose, Microsoft provides the MSDEToText.vbs tool to export logs from a TMG/ISA SQL database into text files.

why

However, the MSDEToText script is producing some weird results for my TMG environments, namely it fails to convert the source and destination IP-addresses properly:
For example, what should be exported as “192.168.1.11″ ends up as “-63.-87.-254.-245″, with negative numbers per octet in the text log. Notice something? Yeah, subtracting each value from 255 yields us the correct IP (well, almost except for the last octet which is off by 1). This happens only for IPs that don’t have an existing computer object defined in the TMG policy.

There is obviously something wrong with the logic inside the MSDEToText VB script. Being completely clueless about VBS (I can’t even remember ever seriously coding/editing something longer than two lines), I dug into the script to see what makes it go bonkers and found the following function to be responsible:

Private Function IPFromDbl(rowValue)
    Dim ipDouble         ' Double
    Dim dot              ' string
    Dim count,octet      ' integers

    ipDouble = CDbl(rowValue)
    IPFromDbl= ""
    dot = "" 
    For count = 1 To 4
        ipDouble = (Fix(ipDouble)) / 256
        octet = 256 * (ipDouble - Fix(ipDouble))
        IPFromDbl = CStr(octet) & dot & IPFromDbl
        dot = "."
    Next

End Function

I added a few lines as you can see below to cope with negative values, and it now exports the correct IPs in my text logs:

Private Function IPFromDbl(rowValue)
    Dim ipDouble         ' Double
    Dim dot              ' string
    Dim count,octet      ' integers

    ipDouble = CDbl(rowValue)
    IPFromDbl= ""
    dot = "" 
    For count = 1 To 4
        ipDouble = (Fix(ipDouble)) / 256
        octet = 256 * (ipDouble - Fix(ipDouble))
	If (octet <= 0) Then
		octet = octet + 255
		If (count = 1) Then
			octet = octet + 1 'because otherwise, the last octet of the IP (first iteration) is always off by one
		End If
	End If
        IPFromDbl = CStr(octet) & dot & IPFromDbl
        dot = "."
    Next

End Function

There is probably a better way to fix this within the preceding lines, but I didn’t want to waste too much time on it. Feel free to share a better idea.

Oh god, there is VBS code on my blog! What has the world come to?

↧

openssl heartbleed attack – The cryptocalypitc judgement day has arrived

April 8, 2014, 8:15 am

≫ Next: Squid and chunked transfer encoding shenanigans

≪ Previous: Forefront TMG Log Export with MSDEToText.vbs messing up IPs

The internet is now (no really, this time it is, pinky-promise) officially broken™. Or at least a large part of it that is responsible for securely handing your shopping cart and credit card information, e-mails, passwords, router configuration or whatever flows through your tubes over encrypted SSL/TLS connections.

Since yesterday there’s a big fuss on the internet over the so-called openssl heartbleed bug that was disclosed yesterday. It exploits the TLS heartbeat extensions and allows an attacker to read up to 64KiB of memory managed by openssl from the target system. This memory could contain sensitive data such as passwords, session-cookies, credit card information, or in one of the worst cases even the server’s x509 certificate private key. Since the attack is basically untraceable (no webserver logs or anything), it’s entirely possible that certificate private keys have been compromised en masse already without anyone noticing and never being able to find out.
You can find some interesting background information about the discovery and disclosure in this article.

To be on the really safe side, you have to replace your certificates, revoking the old ones, changing passwords and generally being PITA’d. Note that this has nothing to do with the openssl version a certificate was generated with, but is about the runtime application using openssl for it’s SSL/TLS traffic. Your private key can be just as compromised if you generated the certificate with Windows CA tools or gnutls and use it with an openssl-based application.

The “good news”, if you can even call them that, is that at least only the newer 1.0.1 branch version of openssl is affected. In total, openssl 1.0.0 and 0.9.8 together are still more widespread than 1.0.1, so the scope is at the very least narrowed down considerably:

What versions of the OpenSSL are affected?
Status of different versions:
    OpenSSL 1.0.1 through 1.0.1f (inclusive) are vulnerable
    OpenSSL 1.0.1g is NOT vulnerable
    OpenSSL 1.0.0 branch is NOT vulnerable
    OpenSSL 0.9.8 branch is NOT vulnerable
Bug was introduced to OpenSSL in December 2011 and has been out in the wild since OpenSSL release 1.0.1 on 14th of March 2012. OpenSSL 1.0.1g released on 7th of April 2014 fixes the bug.

Most Linux distros have released updated openssl packages already and I updated my Fedora20 and CentOS6 today. The fix for CentOS/RHEL6 is a backport to 1.0.1e, so no worries as long as your package is at least 1.0.1e-16.el6_5.7. Remember to also restart all services (postfix/sensmail/apache/nginx etc…) depending on openssl libraries after updating.

If we can trust these tweets, then big shots like Yahoo or ebay are already confirmed to be vulnerable. Cloudflare apparently had an edge and knew was informed about the vulnerability before it was publicly disclosed. Also this wouldn’t be such a huge (but still fairly big) issue if Perfect Forward Secrecy was is implemented properly.
~~At the moment there doesn’t seem to be a publicly available tool like the ssllabs test to check whether a target is vulnerable.~~ This site provides a quick test. — So does this site. — SSLlabs now checks for heartbleed as well.
Here’s a perl script for your own testing pleasure supporting STARTTLS as well.
And here we have a heartbleed proof of concept Windows tool.

Also be aware that this isn’t quite a Linux-only problem. While schannel, the native Windows cryptographic API is safe against this bug, openssl is used in a huge number of native or ported Windows applications as well.

Is vSphere ESXi affected as well?

[Update 2014/04/09]
VMware published a KB article outlining which VMware products and versions are affected.
[/Update]

ESXi 5.5 seems vulnerable. I hope VMware will soon release a Security Advisory clearing things up and providing updates for this horrible issue (which isn’t their fault though).

Let’s have a look at an ESXi 5.5 GA (no U1, but doesn’t matter) host:

    # vmware -vl
    VMware ESXi 5.5.0 build-1331820
    VMware ESXi 5.5.0 GA

    # openssl version -a
    OpenSSL 1.0.1e 11 Feb 2013
    built on: Tue Feb 26 16:34:26 PST 2013

Now here’s an up-to-date ESXi 5.1 U2 and an older 5.0 still U1 host:

# vmware -vl
VMware ESXi 5.1.0 build-1612806
VMware ESXi 5.1.0 Update 2
# openssl version -a
OpenSSL 0.9.8y 5 Feb 2013
built on: Wed Mar 20 20:44:08 PDT 2013

# vmware -vl
VMware ESXi 5.0.0 build-821926
VMware ESXi 5.0.0 Update 1
~ # openssl version -a
OpenSSL 0.9.8q 2 Dec 2010
built on: Mon Mar 14 12:16:37 PDT 2011

As you can see, ESXi 5.5 runs the vulnerable openssl 1.0.1 branch. ESXi 5.1 and 5.0 on the other hand are built on the openssl 0.9.8 branch. Hence versions prior to ESXi 5.5 are unaffected.

perl check-ssl-heartbleed.pl MyESXi55host:443
...ssl received type=22 ver=0x302 ht=0x2 size=50
...ssl received type=22 ver=0x302 ht=0xb size=1020
...ssl received type=22 ver=0x302 ht=0xe size=0
...send heartbeat#1
...ssl received type=24 ver=302 size=16384
BAD! got 16384 bytes back instead of 3 (vulnerable)


perl check-ssl-heartbleed.pl MyESXi55host:5989
...ssl received type=22 ver=0x302 ht=0x2 size=54
...ssl received type=22 ver=0x302 ht=0xb size=1020
...ssl received type=22 ver=0x302 ht=0xe size=0
...send heartbeat#1
...ssl received type=24 ver=302 size=1638

With the perl script linked above I confirmed that ESXi 5.5 can be successfully attacked with the heartbleed attack. Ouch.
Adding the -s switch to print the received output, I can see the content of some XML file that must be loaded in the hosts memory.

The problem will extend to Linux-based virtual appliances by VMware (or whatever vendor for that matter) as well. I have an older vMA 5.1 virtual appliance which is unaffected but I’m not sure with which openssl recent versions of the VCSA/vMA etc come:

# cat /etc/vma-release
vMA 5.1.0 BUILD-1062361
# cat /etc/SuSE-release
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 2     

# openssl version -a
OpenSSL 1.0.0c 2 Dec 2010

Even the Windows-based vCenter server appears to rely on openssl to some extent, at least the Inventory Service. My 5.1 U2 vCenter seems safe though:

"C:\Program Files\VMware\Infrastructure\Inventory Service\bin\openssl.exe" version -a
OpenSSL 0.9.8y 5 Feb 2013
built on: Tue Feb 12 23:38:08 2013

The same can’t be said for vCenter 5.5, which comes with two different openssl binaries:

"C:\Program Files\VMware\CIS\openSSL\openssl.exe" version -a
OpenSSL 1.0.1e 11 Feb 2013
built on: Tue Feb 12 19:37:08 2013

"C:\Program Files\VMware\Infrastructure\Inventory Service\bin\openssl.exe" version -a
OpenSSL 0.9.8y 5 Feb 2013
built on: Tue Feb 12 23:38:08 2013

I can’t really tell what this “CIS” component is supposed to do, but I haven’t been able to reproduce the issue with any of the ports vCenter is listening on yet. I tested a few of the available heartbleed scripts against Windows-based vCenter 5.5 and 5.1 on all ports the system is listening on (including Web Client 9443, Inventory 10443, SSO 7444 etc) but they were never reported being vulnerable. I suppose this is because the actual SSL traffic is handled in the Java application’s own SSL stack instead of depending on openssl, which might only be used for certain operations such as certificate generation.
In any case, this demonstrates yet again how important it is to keep your ESXi host management network on a firewalled, non-public secure network.

Apart from the VMware vSphere components I also checked some other SSL-based applications running in our infrastructure with the scripts (this is just my own testing with the available proof of concept tools, mind you):

[Update 2014/04/14]
HP has published a support notice outlining their affected and unaffected products. In short, they acknowledge that recent versions of HP SMH, Blade OA and SUM are vulnerable.
[/Update]

HP Systems Management Homepages (SMH) on physical Windows and Linux (tested with SMH v7.3.1, v7.2.2.9, v7.2.1.3, v7.2.0.14) servers are affected. (It seems like you can fix this manually (at least on Linux) by overwriting the openssl files that ship with the SMH, see comments)
HP BladeSystem Onboard Administrator Web Interface (tested with v4.11) is affected.
HP Virtual Connect Web Interface (tested with v3.70 and v4.10) is not affected.
HP ILO interfaces (tested recent firmware versions of ILO2, ILO3 and ILO4) don’t seem vulnerable. They apparently run an older openssl version that doesn’t support the exploited TLS Heartbeat feature to begin with. However, it appears you can DoS ILO2 interfaces with the heartbleed scripts for some reason. See the comments and The HP Community thread here.
Check Point Products are unaffected.
There is a huge list of other 3rd party products and services affected. I suggest you contact your vendors if they haven’t published information yet.

↧

Squid and chunked transfer encoding shenanigans

May 8, 2014, 11:51 am

≫ Next: [Script] Poor man’s vSphere network health check for standard vSwitches

≪ Previous: openssl heartbleed attack – The cryptocalypitc judgement day has arrived

A while ago I was troubleshooting an issue with a certain client software (that I don’t even know the name of) on an internal network, that was uploading images via HTTP to a remote service on the internet. The simple HTTP POST upload requests failed for some reason with a HTTP/1.0 501 Not Implemented error from our Squid proxy server, prompting me to analyze further.
Fortunately the I had a network trace from the local admins so I could easily reproduce and test the requests with curl from my Linux client.

The issue proved to be caused to the chunked Transfer-Encoding headers of the POSTs and how our Squid 3.1 proxy server is unable to process requests with this Transfer-Encoding in certain cases.

I’ve seen a few mentions of issues with chunked Transfer-Encoding and Squid on the internets but many were older, contradictory or referencing Server side responses and not client side requests like in this case.

In a nutshell, Squid 3.1 is just not fully HTTP/1.1 compatible yet and only partially supports some features like chunked encoding as stated on the project’s website:

Squid-3.2 claims HTTP/1.1 support. Squid v3.1 claims HTTP/1.1 support but only in sent requests (from Squid to servers). Earlier Squid versions do not claim HTTP/1.1 support by default because they cannot fully handle Expect:100-continue, 1xx responses, and/or chunked messages.
Both Squid-3 and Squid-2 contain at least response chunked decoding. The chunked encoding portion is available from Squid-3.2 on all traffic except CONNECT requests.

The following HTTP POST request with a chunked encoding header uploading a dummy file of 65523 byte works fine through Squid 3.1:
(Note: I’m just passing the header with curl like. The payload is actually not properly formatted chunked as explained below, but it doesn’t matter for the sake of these tests.)

$ head -c65523 /dev/urandom > /tmp/data
$ curl -v 'http://example.com' -H 'Transfer-Encoding: chunked' -x 192.168.1.221:3128 -X POST --data-binary @/tmp/data
* About to connect() to proxy 192.168.1.221 port 3128 (#0)
*   Trying 192.168.1.221...
* connected
* Connected to 192.168.1.221 (192.168.1.221) port 3128 (#0)
> POST http://example.com HTTP/1.1
> User-Agent: curl/7.24.0 (i686-pc-linux-gnu) libcurl/7.24.0 GnuTLS/2.12.18 zlib/1.2.5.1 libidn/1.19
> Host: example.com
> Accept: */*
> Proxy-Connection: Keep-Alive
> Transfer-Encoding: chunked
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
> 
* Done waiting for 100-continue
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
[...]

Squid access.log:
1398345135.604    627 10.1.1.204 TCP_MISS/200 1706 POST http://example.com/ - FIRST_UP_PARENT/192.168.1.10 text/html

All good here.

Now the exact same request that carries only one additional byte payload fails with HTTP/1.0 501 Not Implemented:

$ head -c65524 /dev/urandom > /tmp/data 
$ curl -v 'http://example.com' -H 'Transfer-Encoding: chunked' -x 192.168.117.221:3128 -X POST --data-binary @/tmp/data
* About to connect() to proxy 192.168.1.221 port 3128 (#0)
*   Trying 192.168.1.221...
* connected
* Connected to 192.168.1.221 (192.168.1.221) port 3128 (#0)
> POST http://example.com HTTP/1.1
> User-Agent: curl/7.24.0 (i686-pc-linux-gnu) libcurl/7.24.0 GnuTLS/2.12.18 zlib/1.2.5.1 libidn/1.19
> Host: example.com
> Accept: */*
> Proxy-Connection: Keep-Alive
> Transfer-Encoding: chunked
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
> 
* Done waiting for 100-continue
* HTTP 1.0, assume close after body
< HTTP/1.0 501 Not Implemented
[...]
Unsupported Request Method and Protocol
Squid does not support all request methods for all access protocols. For example, you can not POST a Gopher request.
ERR_UNSUP_REQ
[...]

Squid access.log:
1398345142.273      0 10.1.1.204 NONE/501 3713 POST http://example.com/ - NONE/- text/html

This doesn’t work and clearly I’m not using Gopher here, hah.

Turns out this behavior is controlled by the chunked_request_body_max_size Squid directive, which has a default value of 64KB:

Squid configuration directive chunked_request_body_max_size:
A broken or confused HTTP/1.1 client may send a chunked HTTP request to Squid. Squid does not have full support for that feature yet. To cope with such requests, Squid buffers the entire request and then dechunks request body to create a plain HTTP/1.0 request with a known content length. The plain request is then used by the rest of Squid code as usual.
The option value specifies the maximum size of the buffer used to hold the request before the conversion. If the chunked request size exceeds the specified limit, the conversion fails, and the client receives an “unsupported request” error, as if dechunking was disabled.
Dechunking is enabled by default. To disable conversion of chunked requests, set the maximum to zero.
Request dechunking feature and this option in particular are a temporary hack. When chunking requests and responses are fully supported, there will be no need to buffer a chunked request.

Ok, according to the default value the exact failure boundary should be 65536 bytes instead of 65524 but I’m sure there is some sound explanation for this so let’s disregard this.
After increasing the default parameter by adding chunked_request_body_max_size 1024 KB to the squid.conf file and reloading Squid, the uploads worked went through without problems. In our case 1MB is sufficient, but you may need more depending on the size of your requests.

The parameter apparently exists for Squid 3.2 and the development stage 3.3/3.4 as well, but I’m not sure if it’s really relevant because of the initial statement linked above. Unfortunately I don’t have a Squid 3.2+ system to test with. On RHEL/CentOS 6, which our proxy servers are running, Squid is available in version 3.1 only.

On the topic of chunked Transfer-Encoding

Chunked Transfer-Encoding means that a single HTTP body (like a file) can be split into multiple parts for transmission. In that case the traditional Content-Length header indicating the size of the entire request or response body is omitted. Instead multiple parts of data are transmitted sequentially, prefixed with the it’s length (bytes in hex) in the body, followed by a zero-size chunk. A chunked request from my case basically looks like this on the wire:

Client request:
POST http://example.com HTTP/1.1
Host: example.com
[...Some other headers...]
Transfer-Encoding: chunked
[CRLF]
8000 [First chunk of the body payload, 32KB] [CRLF]
8000 [Second chunk of the body payload, 32KB] [CRLF]
[...]
FF [Last chunk of body payload, 255 bytes] [CRLF]
0 [CRLF]
[CRLF]

Server response:
HTTP/1.1 200 OK
[...]

Why is this useful you may ask? It wasn’t obvious to me at first, but there is a very practical advantage in our everyday web browsing with chunked transfer encoding (although it isn’t really that widespread):
A web server that has to dynamically generate a web page can start sending HTML code to the browser almost immediately. It does not have to wait until the page is completely generated and it knows the total size to transfer it in one normal big “chunk” to the client. With this the client browser can already display parts of the page, fetch linked resources or execute scripts (most are blocked until the entire page is received though). This can improve the site loading speed for end users considerably, especially if there is a lengthy continuous flow of information like a huge table where data is returned from a slow database query. This technique is also known as streamed HTTP responses.

In my particular problem case of a client software uploading images I don’t really see the point of using chunked encoding though.

↧

[Script] Poor man’s vSphere network health check for standard vSwitches

May 26, 2014, 8:27 am

≫ Next: Illegal OpCode Red Screen of Death while booting a HP Proliant server from an USB SD card

≪ Previous: Squid and chunked transfer encoding shenanigans

Among many other new features, the vSphere 5.1 distributed vSwitch brought us the Network Health Check feature. The main purpose of this feature is to ensure that all ESXi hosts attached to a particular distributed vSwitch can access all VLANs of the configured port groups and with the same MTU. This is really useful in situations where you’re dealing with many VLANs and the roles of virtualization and network admin are strongly separated.

Unfortunately, like pretty much all newer networking features, the health check is only included in the distributed vSwitch which requires vSphere Enterprise+ licenses.

There are a couple other (cumbersome) options you have though:
If you’re have ESXi 5.5 you can use the pktcap-uw utility on the ESXi shell to check if your host receives frames for a specific VLAN on an uplink port:
The following example command will capture receive-side frames tagged with VLAN 100 on uplink vmnic3:
# pktcap-uw –uplink vmnic3 –dir 0 –capture UplinkRcv –vlan 100
If systems are active in this VLAN you should see a few broadcasts or multicasts already and meaning the host is able to receive frames on this NIC for this VLAN. Repeat that for every physical vmnic uplink and VLAN.

Another way to check connectivity is to create a vmkernel interface for testing on this VLAN and using vmkping. Since manually configuring a vmkernel interface for every VLAN seems like a huge PITA, I came up with a short ESXi shell script to automate that task.
Check out the script below. It uses a CSV-style list to configure vmkernel interfaces with certain VLAN and IP settings, pinging a specified IP on that network with the given payload size to account for MTU configuration. This should at least take care of initial network-side configuration errors when building a new infrastructure or adding new hosts.
This script was tested successfully on ESXi 5.5 and should work on ESXi 5.1 as well. I’m not entirely sure about 5.0 but that should be ok too. (Please leave a comment in case you can confirm/refute that).

Introducing the ghetto-vSwitchHealthCheck.sh:

#!/bin/sh

#First define your variables here.
vswitch=vSwitch2 #The vSwitch you want to test your VLANs on.
uplink=vmnic3 #The physical uplink you want to check. Make sure it's one of the uplinks configured for the vSwitch.
tempvmk=vmk9 #The name of the temporary vmkernel interface created for sending your test pings.
portgroup=testpg #The name of the temporary Port Group created to use for the tempvmk interface.
pingsize=56 #The payload size to use for the vmkpings. Increase it if you want to check jumbo-frame connectivity on the VLANs.
file=/tmp/nets.csv #Path to a CSV-file containing the network related infos. Each line represents a CSV-style set of the following: VLAN, IP to assign to $tempvmk, Netmask for $tempvmk, Target IP you want to ping on this network.

#Creates a temporary Port Group and vmkernel interface, also makes sure the Port Group only uses the specified vmnic $uplink as it's active uplink.
esxcli network vswitch standard portgroup add --portgroup-name $portgroup --vswitch-name $vswitch
esxcli network vswitch standard portgroup policy failover set --portgroup-name $portgroup --active-uplinks $uplink --load-balancing explicit
esxcli network ip interface add --portgroup-name $portgroup --interface-name $tempvmk

#Loops through the list of networks.
grep -v '^#' $file | while read network
do
#Fill in the individual variables from the line.
i=1
for var in "vlan" "ip" "netmask" "target"
do
export $var=$(echo yes | awk -v network=$network '{print network;}' | grep -Eo '[^,]+' | awk "NR == $i")
i=$(expr $i + 1)
done

echo -e "\n--------\nTrying VLAN: $vlan\tIP: $ip\tNetmask: $netmask\tTarget-IP: $target\n"
#Assign the VLAN ID to $portgroup.
esxcli network vswitch standard portgroup set --portgroup-name $portgroup --vlan-id $vlan
#Assign the IP and Netmask settings to $tempvmk
esxcli network ip interface ipv4 set -t static --interface-name $tempvmk --ipv4 $ip --netmask $netmask

sleep 3
#Ping the target IP three times via $tempvmk
vmkping -4 -d -c 3 -s $pingsize -I $tempvmk $target |
if grep -q " 0% packet loss"
then
echo "Successfully pinged target IP $target for VLAN $vlan on uplink $uplink."
else
echo "Error! Couldn't ping target IP $target for VLAN $vlan on uplink $uplink."
fi
done

#Remove temporarily created Port Group and vmkernel interface.
esxcli network ip interface remove --interface-name $tempvmk
esxcli network vswitch standard portgroup remove --portgroup-name $portgroup --vswitch-name $vswitch

Feed the script with an input file like this:

~ # cat /tmp/nets.csv 
#vlan,vmkip,vmknetmask,target
20,10.1.20.99,255.255.255.0,10.1.20.1
30,10.1.30.99,255.255.255.0,10.1.30.1
40,10.1.40.88,255.255.255.0,10.1.40.1

The resulting output will look this:

~ # sh /tmp/checkvlans.sh
--------
Trying VLAN: 20        IP: 10.1.20.99        Netmask: 255.255.255.0  Target-IP: 10.1.20.1
Successfully pinged target IP 10.1.20.1 for VLAN 20 on uplink vmnic3
--------

Trying VLAN: 30        IP: 10.1.30.99        Netmask: 255.255.255.0  Target-IP: 10.1.30.1
Successfully pinged target IP 10.1.30.1 for VLAN 30 on uplink vmnic3
 --------

Trying VLAN: 40        IP: 10.1.40.88        Netmask: 255.255.255.0  Target-IP: 10.1.40.1
Error! Couldn't ping target IP 10.1.40.1 for VLAN 40 on uplink vmnic3

The general logic of this script should be easily portable to PowerCLI as well using Get-EsxCli style esxcli network diag ping. Maybe I’ll work on that some other time.

↧