Saturday, December 12, 2015

vSphere | A VM is showing disk size of 0B

Being given a task to figure out and clean up the mess of a poorly maintained vSphere Infrastructure a while ago, i encountered a very strange problem.

They informed me that the Veeam Backup failed for that VM, and the error message was:

Virtual disk configuration change detected, resetting CBT failed Details: A general system error occurred: 
Creating VM snapshot
Error: A general system error occurred: 


After I edited the settings of the VM I was suprised to see that the disk sizes were listed as 0 on the menu.


Unfortunately, for my greater torture was that I couldn't find any related KB articles for this problem. One or two issues that I found across the net was users experiences similar problems, but their solutions were entirely different, and very drastic - reboot of the entire ESXi host.

I couldn't allow myself to reboot the host as this was a production cluster. After several hours of researching I finally found the solution to my pain, so I decided to write this down just in case anyone else encounters a similar problem.

The reason behind this symptom are rogue snapshot of the VM. Yes and I emphasize rogue because they cannot be detected by the snapshot manager nor by using CLI commands. If you use 3rd party backup solution (in my case that was Veeam), and there is a interruption of the backup process, a rogue snapshot could interfere with the VM and for some reason, that would mess up the things and the snapshots will be orphaned to that extent that they will no longer be visible anywhere. To make the things worse, the Veeam will continue the backups, until the snapshots are so deeply nested that at some time everything will reach a breaking point. You will either end up with an unresponsive VM, or if that is a highly utilized VM, the changes will fill up the datastore and more mess will happen.

So how do you solve this problem?

First things first:

Power down the VM!

Second:

If this happens there is no guarantee that you will be able to recover the VM. So better have an older backup ready, or suffer. In my case the VM was just a Domain Controller. So VM backups in DCs are irrelevant. If I screw this VM (which is already screwed) than I can install and promote another DC and the problem will be solved. Nonetheless I wanted to recover it.

Third:

If you ever allow something like this to happen to you - forgotten VM snapshots - than this would indicate that you are better of finding something else to work as the sysadmin job is probably not for you. This did not happen to me. I adopted this infrastructure and I was tasked to fix it.

Now power down the VM and man up!

When you experience this symptoms in this case if you go to the Snapshot Manager, there will be no snapshots. The snapshots will also not be detected using the ESXi shell. That was the source of my pain. I suspected rogue snapshots problem, but the snapshot manager was empty. Than I opened SSH to the host and I was searching for snapshots using the ESXi CLI commands. But for some reason the output was still empty:

# vim-cmd vmsvc/getallvms


Take note of the Vmid in question (in my case that was 171) and try to find the snappshots using the appropriate ESXi CLI command:


# vim-cmd vmsvc/snapshot.get 171

If the output is empty than try and look intto the VM folder into the VM datastore. The datastores are located in /vmfs/volumes. The friendly datastore names [Temp-Prod-Lun] for example, are sim-links pointing to the LUN IDs which are the real datastore names.

# ls -l /vmfs/volumes/

Take note of the ID of your datastore and cd into that folder where you will find your VM:

# /vmfs/volumes/51a73a5f-987279f4-8a2f-a44c112a5a4a/DC02

Lets look what's inside the VM's folder?

# ls -l
/vmfs/volumes/51a73a5f-987279f4-8a2f-a44c112a5a4a/DC02 # ls -l
total 302143520
-rw-------    1 root     root         31848 Dec 10 21:04 DC02-Snapshot1271.vmsn
-rw-------    1 root     root         31938 Dec 10 23:40 DC02-Snapshot1272.vmsn
-rw-------    1 root     root         31862 Dec 11 08:55 DC02-Snapshot1273.vmsn
-rw-------    1 root     root         31938 Dec 11 09:38 DC02-Snapshot1274.vmsn
-rw-------    1 root     root         31862 Dec 12 02:28 DC02-Snapshot1275.vmsn
-rw-------    1 root     root         31938 Dec 12 02:34 DC02-Snapshot1276.vmsn
-rw-------    1 root     root         31938 Dec 12 13:51 DC02-Snapshot1277.vmsn
-rw-------    1 root     root         31938 Dec 12 15:00 DC02-Snapshot1278.vmsn
-rw-------    1 root     root         31938 Dec 12 15:11 DC02-Snapshot1279.vmsn
-rw-------    1 root     root         31936 Dec 12 18:50 DC02-Snapshot1280.vmsn
-rw-r--r--    1 root     root            13 Oct  5 17:22 DC02-aux.xml
-rw-r--r--    1 root     root            73 Dec 10 14:50 DC02-b981b179.hlog
-rw-------    1 root     root     8589934592 Dec 12 20:48 DC02-b981b179.vswp
-rw-------    1 root     root     64424509440 Nov  1  2013 DC02-flat.vmdk
-rw-------    1 root     root          8684 Dec 12 20:53 DC02.nvram
-rw-------    1 root     root           471 Nov  1  2013 DC02.vmdk
-rw-r--r--    1 root     root          4796 Dec 12 18:50 DC02.vmsd
-rwxr-xr-x    1 root     root          3415 Dec 12 20:48 DC02.vmx
-rw-------    1 root     root             0 Dec 12 20:48 DC02.vmx.lck
-rw-r--r--    1 root     root          3264 Dec 10 14:50 DC02.vmxf
-rwxr-xr-x    1 root     root          3414 Dec 12 20:48 DC02.vmx~
-rw-------    1 root     root       3932672 Dec 10 23:40 DC02_1-000001-ctk.vmdk
-rw-------    1 root     root     4831965184 Dec 10 23:40 DC02_1-000001-delta.vmdk
-rw-------    1 root     root           393 Dec 10 23:39 DC02_1-000001.vmdk
-rw-------    1 root     root     7180775424 Dec 11 08:55 DC02_1-000002-delta.vmdk
-rw-------    1 root     root           331 Dec 11 08:55 DC02_1-000002.vmdk
-rw-------    1 root     root       3932672 Dec 11 09:37 DC02_1-000003-ctk.vmdk
-rw-------    1 root     root     2751590400 Dec 11 09:37 DC02_1-000003-delta.vmdk
-rw-------    1 root     root           400 Dec 11 09:35 DC02_1-000003.vmdk
-rw-------    1 root     root     9143709696 Dec 12 02:28 DC02_1-000004-delta.vmdk
-rw-------    1 root     root           331 Dec 12 02:28 DC02_1-000004.vmdk
-rw-------    1 root     root       3932672 Dec 12 02:34 DC02_1-000005-ctk.vmdk
-rw-------    1 root     root      16904192 Dec 12 02:34 DC02_1-000005-delta.vmdk
-rw-------    1 root     root           400 Dec 12 02:30 DC02_1-000005.vmdk
-rw-------    1 root     root       3932672 Dec 12 13:51 DC02_1-000006-ctk.vmdk
-rw-------    1 root     root     7281438720 Dec 12 13:51 DC02_1-000006-delta.vmdk
-rw-------    1 root     root           400 Dec 12 02:34 DC02_1-000006.vmdk
-rw-------    1 root     root       3932672 Dec 12 15:00 DC02_1-000007-ctk.vmdk
-rw-------    1 root     root     3103911936 Dec 12 15:00 DC02_1-000007-delta.vmdk
-rw-------    1 root     root           400 Dec 12 13:51 DC02_1-000007.vmdk
-rw-------    1 root     root       3932672 Dec 12 15:11 DC02_1-000008-ctk.vmdk
-rw-------    1 root     root     1359081472 Dec 12 15:11 DC02_1-000008-delta.vmdk
-rw-------    1 root     root           400 Dec 12 15:00 DC02_1-000008.vmdk
-rw-------    1 root     root       3932672 Dec 12 15:21 DC02_1-000009-ctk.vmdk
-rw-------    1 root     root      50458624 Dec 12 15:21 DC02_1-000009-delta.vmdk
-rw-------    1 root     root           400 Dec 12 15:11 DC02_1-000009.vmdk
-rw-------    1 root     root       3932672 Dec 12 20:49 DC02_1-000010-ctk.vmdk
-rw-------    1 root     root     134344704 Dec 12 20:55 DC02_1-000010-delta.vmdk
-rw-------    1 root     root           400 Dec 12 20:48 DC02_1-000010.vmdk
-rw-------    1 root     root     64424509440 Dec 10 21:04 DC02_1-flat.vmdk
-rw-------    1 root     root           522 Dec 10 21:04 DC02_1.vmdk
-rw-------    1 root     root       8192512 Dec 10 23:39 DC02_2-000001-ctk.vmdk
-rw-------    1 root     root      17035264 Dec 10 23:39 DC02_2-000001-delta.vmdk
-rw-------    1 root     root           393 Dec 10 23:39 DC02_2-000001.vmdk
-rw-------    1 root     root      17035264 Dec 11 08:55 DC02_2-000002-delta.vmdk
-rw-------    1 root     root           331 Dec 11 08:55 DC02_2-000002.vmdk
-rw-------    1 root     root       8192512 Dec 11 09:35 DC02_2-000003-ctk.vmdk
-rw-------    1 root     root      17035264 Dec 11 09:35 DC02_2-000003-delta.vmdk
-rw-------    1 root     root           400 Dec 11 09:35 DC02_2-000003.vmdk
-rw-------    1 root     root      17035264 Dec 12 02:28 DC02_2-000004-delta.vmdk
-rw-------    1 root     root           331 Dec 12 02:28 DC02_2-000004.vmdk
-rw-------    1 root     root       8192512 Dec 12 02:30 DC02_2-000005-ctk.vmdk
-rw-------    1 root     root        258048 Dec 12 02:28 DC02_2-000005-delta.vmdk
-rw-------    1 root     root           400 Dec 12 02:29 DC02_2-000005.vmdk
-rw-------    1 root     root       8192512 Dec 12 13:51 DC02_2-000006-ctk.vmdk
-rw-------    1 root     root      17035264 Dec 12 13:51 DC02_2-000006-delta.vmdk
-rw-------    1 root     root           400 Dec 12 02:36 DC02_2-000006.vmdk
-rw-------    1 root     root       8192512 Dec 12 15:00 DC02_2-000007-ctk.vmdk
-rw-------    1 root     root      17035264 Dec 12 15:00 DC02_2-000007-delta.vmdk
-rw-------    1 root     root           400 Dec 12 13:55 DC02_2-000007.vmdk
-rw-------    1 root     root       8192512 Dec 12 15:11 DC02_2-000008-ctk.vmdk
-rw-------    1 root     root      17035264 Dec 12 15:11 DC02_2-000008-delta.vmdk
-rw-------    1 root     root           400 Dec 12 15:08 DC02_2-000008.vmdk
-rw-------    1 root     root       8192512 Dec 12 15:21 DC02_2-000009-ctk.vmdk
-rw-------    1 root     root      17035264 Dec 12 15:21 DC02_2-000009-delta.vmdk
-rw-------    1 root     root           400 Dec 12 15:18 DC02_2-000009.vmdk
-rw-------    1 root     root       8192512 Dec 12 20:49 DC02_2-000010-ctk.vmdk
-rw-------    1 root     root      17035264 Dec 12 20:50 DC02_2-000010-delta.vmdk
-rw-------    1 root     root           400 Dec 12 20:50 DC02_2-000010.vmdk
-rw-------    1 root     root     134217728000 Dec 10 20:56 DC02_2-flat.vmdk
-rw-------    1 root     root           523 Dec 10 21:04 DC02_2.vmdk
-rw-r--r--    1 root     root        151794 Jul 31  2014 vmware-16.log
-rw-r--r--    1 root     root     1158567124 Jan 15  2015 vmware-17.log
-rw-r--r--    1 root     root     296222265 Sep 30 06:38 vmware-18.log
-rw-r--r--    1 root     root        383386 Sep 30 06:43 vmware-19.log
-rw-r--r--    1 root     root       6178487 Dec 10 14:50 vmware-20.log
-rw-r--r--    1 root     root        420396 Dec 12 15:21 vmware-21.log
-rw-r--r--    1 root     root        226159 Dec 12 20:54 vmware.log
-rw-------    1 root     root     135266304 Dec 12 20:48 vmx-DC02-3112284537-1.vswp


And look at that mess?!

No wonder the VM is having trouble working. Now how do we rectify something like this?

First thing's first. We have to be sure that the VM is actually running on snapshots.

Let's take a look at the .vmx file and see where the .vmdk disk files are pointing at:  

# cat DC02.vmx
.encoding = "UTF-8"
config.version = "8"
virtualHW.version = "10"
nvram = "DC02.nvram"
pciBridge0.present = "TRUE"
svga.present = "TRUE"
pciBridge4.present = "TRUE"
pciBridge4.virtualDev = "pcieRootPort"
pciBridge4.functions = "8"
pciBridge5.present = "TRUE"
pciBridge5.virtualDev = "pcieRootPort"
pciBridge5.functions = "8"
pciBridge6.present = "TRUE"
pciBridge6.virtualDev = "pcieRootPort"
pciBridge6.functions = "8"
pciBridge7.present = "TRUE"
pciBridge7.virtualDev = "pcieRootPort"
pciBridge7.functions = "8"
vmci0.present = "TRUE"
hpet0.present = "TRUE"
displayName = "DC02"
extendedConfigFile = "DC02.vmxf"
virtualHW.productCompatibility = "hosted"
floppy0.present = "FALSE"
svga.vramSize = "8388608"
numvcpus = "4"
memSize = "8192"
sched.cpu.units = "mhz"
sched.cpu.affinity = "all"
sched.mem.affinity = "all"
powerType.powerOff = "soft"
powerType.suspend = "hard"
powerType.reset = "soft"
scsi0.virtualDev = "lsisas1068"
scsi0.present = "TRUE"
ethernet0.virtualDev = "e1000e"
ethernet0.networkName = "VM Network"
ethernet0.addressType = "vpx"
ethernet0.generatedAddress = "00:50:56:a9:7a:c5"
ethernet0.present = "TRUE"
vmci.filter.enable = "TRUE"
guestOS = "windows8srv-64"
disk.EnableUUID = "TRUE"
toolScripts.afterPowerOn = "TRUE"
toolScripts.afterResume = "TRUE"
toolScripts.beforeSuspend = "TRUE"
toolScripts.beforePowerOff = "TRUE"
uuid.bios = "42 29 f9 80 7c 94 46 68-f0 91 d2 29 24 c9 fa f8"
vc.uuid = "50 29 de a0 04 5e b0 6f-5f 8d f2 cf c4 58 11 76"
sched.cpu.min = "0"
sched.cpu.shares = "normal"
sched.mem.min = "0"
sched.mem.minSize = "0"
sched.mem.shares = "normal"
sched.swap.derivedName = "/vmfs/volumes/51a73a5f-987279f4-8a2f-a44c112a5a4a/DC02/DC02-b981b179.vswp"
replay.supported = "FALSE"
replay.filename = ""
pciBridge0.pciSlotNumber = "17"
pciBridge4.pciSlotNumber = "21"
pciBridge5.pciSlotNumber = "22"
pciBridge6.pciSlotNumber = "23"
pciBridge7.pciSlotNumber = "24"
scsi0.pciSlotNumber = "160"
ethernet0.pciSlotNumber = "192"
vmci0.pciSlotNumber = "32"
scsi0.sasWWID = "50 05 05 60 7c 94 46 60"
vmci0.id = "617216760"
vm.genid = "1647351642771223837"
vm.genidX = "8596475237440002371"
vmotion.checkpointFBSize = "8388608"
cleanShutdown = "TRUE"
softPowerOff = "TRUE"
toolsInstallManager.lastInstallError = "0"
tools.remindInstall = "TRUE"
toolsInstallManager.updateCounter = "12"
tools.syncTime = "FALSE"
unity.wasCapable = "TRUE"
scsi0:0.fileName = "DC02_1-000010.vmdk"
scsi0:0.present = "true"
scsi0:0.redo = ""
scsi0:0.deviceType = "scsi-hardDisk"
sched.scsi0:0.shares = "normal"
sched.scsi0:0.throughputCap = "off"
scsi0:1.fileName = "DC02_2-000010.vmdk"
scsi0:1.present = "true"
scsi0:1.redo = ""
scsi0:1.deviceType = "scsi-hardDisk"
sched.scsi0:1.shares = "normal"
sched.scsi0:1.throughputCap = "off"
sata0.pciSlotNumber = "33"
sata0.present = "TRUE"
sata0:0.startConnected = "FALSE"
sata0:0.allowGuestConnectionControl = "TRUE"
sata0:0.deviceType = "cdrom-image"
sata0:0.fileName = "/vmfs/volumes/f614ea11-a1d75394/SW_DVD5_Office_Professional_Plus_2010w_SP1_64Bit_English_CORE_MLF_X17-76756.ISO"
sata0:0.present = "TRUE"
vmotion.checkpointSVGASize = "11534336"
migrate.hostlog = "./DC02-b981b179.hlog"
config.readOnly = "FALSE"
uuid.location = "56 4d 32 56 c4 ff 22 4c-3e 0b 0d 2a bc 68 0c 1b"
scsi0:2.deviceType = "scsi-hardDisk"
scsi0:3.deviceType = "scsi-hardDisk"
SCSI0:0.ctkEnabled = "TRUE"
SCSI0:1.ctkEnabled = "TRUE"
ctkEnabled = "TRUE"


We can see that the disk files are pointing to the disk extensions that are created during the snapshot creation process:

scsi0:0.fileName = "DC02_1-000010.vmdk"
scsi0:1.fileName = "DC02_2-000010.vmdk"

That means the the VM is running on snapshots. However you cannot commit these snapshots via the CLI because the ESXi host is not aware of this.

I was banging my head for awhile and then it occurred to me. 

You will fix this by removing the VM from the inventory! Then register the VM again. If you have another ESXi host to spare than register the VM on that host. While registering, the host will scan the VM and see the snapshots.

After that you are safe to commit the snapshots.

I don't have screenshots to show you of all of this, however in my case the VM had some 10+ nested snapshots which I safely committed from the vSphere Web Client. If you have too much snapshots than you have to commit those via the CLI using the appropriate commands.

Articles used while troubleshooting:




If you cannot remove the VM from the inventory than use this.

This poor fella had a similar problem:










8 comments:

  1. Just FYI, for me same issue and symptoms but simply choosing "Consolidate" in the Snapshot menu for that VM fixed the issue without doing anything more. Just something for others to try first.

    ReplyDelete
    Replies
    1. I am glad that someone took the time to see this post. The thing is that this also applies when one is unable to use consolidate. In the setup that I encountered they were unable to consolidate the VMs.

      Delete
    2. Me too occurred the same problem, but not due to a wrong backup, but when I put a host in maintenance and this has started to move the vm to another host, at one point he stopped trying to move the last vm (VM01), then I tried to vmotion VM01 manually and since even that could not move, I turned off the vm, the I moved, but I could not more started getting different errors. After a thorough check, I found that the disks were 0 GB. I also provate "consolidate" and not working. Instead remove and re-register the vm again, even in the same host has solved the problem.
      Greatg Tom!!!

      Delete
    3. I am happy that this post helped someone else as well.

      Delete
  2. Dear Tom, thank you for your great post . this thing happened to me . i was searching in the internet . Found your post . i will try to do this procedure today . great post .
    Thank you for sharing.

    ReplyDelete
  3. I too had the same issue as you mentioned i removed from the inventory and registered it again after thay my disk size showed the orginal size but unable to consolidate the disk. So i took a snapshot and tried to commit it after that again the disk size shows as 0 MB .

    ReplyDelete
  4. I was going to remove it from inventory but I turned it off and it showed the proper disk size.
    I was able to increase it and turn it back on, no issues.

    ReplyDelete
  5. we had same issue and we have note down path, vmdk file name and scsi controllers information. Later from edit settings of VM we have removed the disk from VM(not deleted). Again we have added back the disk to the server and issue got fixed.

    ReplyDelete