Quantcast
Channel: VMware Communities: Message List
Viewing all articles
Browse latest Browse all 232894

Re: Why Failed to clone disk: There is not enough space on the file system for the selected operation (13).

$
0
0

I found the bug.
Problem recap:

I have a 2003 Server with WSFU that I have been doing this since 2008.  HOWEVER, I ran into a problem after “upgrading” from vSphere 4.0 to 5.1.  While both Windows and vSphere show plenty of disk space, GhettoVCB/vmkfstools now throws an error that it ran out of disk space.

 

Experiment:
- I decided to isolate the problem by backing up a single VM to a new NFS VHD.  I made a new 250GB VHD, to try something different, I made it EagerZeroedThick (usually thin), and initially left compression off.  The VM is 80GB provisioned, and 57GB thin.
- I set the rotation to 2, since you need 2 + 1 to do the backup.  After the backup finishes it drops the oldest backup.
- I ran many rotations and there were no problems.  I watched the disk space and it would go down to 9GB during backup and bounce back after the backup completed.  All worked as expected.
- I turned on compression for the volume in Windows, and waited for it to finish.  When it finished, I had a ton of free space, and both Windows and the vsphere client total and free space numbers agreed.  I had lots of free space.
- Next I cranked the number of copies to 4 in GhettoVCB.  The first backup went fine as expected, and left me with lots of free space due to the compression.  The next backup, which would have made the 4th clone, failed with an insufficient disk space error.  Thus, it failed at the same point as if there the volume had no compression.  (Changing from thin to EagerZeroedThick made no difference.)
- I then deleted two of the backups, so I was back down to 2 again, which would leave room for a third, and ran the backup.  It failed.  Huh!  Even with the 4th failed backup still there, the volume shows 107GB free.
- Next, I deleted the failed backup, went into the second backup directory, made a sub-directory named junk, and copied the backup files into it.  This takes vmkfstools out of the equation for the copy.  During the copy, I received the error:

"Windows - Delayed Write Failed

Windows was unable to save all of the data for the file <whatever>  The data has been lost.  This error may be caused by a failure of your computer hardware or network connections."

Wha...t!  Windows and vSphere show tons of space, and even if there were no compression at all, there should have been room.  However, this is consistent as to what has been happening to people.  When it gets to this point, you can delete the files to make space and you still can't do a backup.  Something changes in the VHD to where you can't get the space back? 
- OK, well I've tried chkdsk in the past, and that didn't fix it so I'll try decompressing the disk and see if that fixes it.  Yep!  Sooo...it appears Windows is the rat!  But why just since I went to 5.1?  I suppose it could be that vSphere corrupts the NFS somehow, but for an error like this, that's not very likely.  NFS is just a protocol to interface to the host operating system.  It's time to see what Windows has to say for itself.  After some Googling, I came up with this:
Error message and events are logged in the System log when you try to compress a large file on an NTFS volume in Window… 
The status of the problem is:
"Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section"

- Applies to:
Windows 2000 until Server 2008.  However, the article was last updated in 2007, so I have no confidence that it doesn't also apply to Microsoft's current operating systems.
- Based on what percent complete it fails at, it appears their definition of large is somewhere around ~50GB.  I have others that have a provisioned size of 80GB, but their thin size is in the 36GB range, and they don't cause any problems.  It's a thing of beauty when VHD is thin and NTFS compression, however, there is a major fly in the ointment.  The only theory I have as to why it didn't happen with 4.0, is because with the new server, I have room for more copies.  I may not have the exact combination that causes things to fail, but I do know who the rat is, and it's Windows.  It's nice to have this bit of misery solved for myself and others using this approach.  I don't know if ZFS can do any better, and I don't like the way it sucks up memory like a whirlpool.  The newest Windows might work, with its new compression, and de-dup features, but I have yet to prove it.

 

Other observations:

-  Concerning backup speeds, I get vastly different speeds when running with NTFS compression off vs. on.  Across a GB Internet connection I get 70MB/second without compression (pretty much wire speed), and about 26MB/second with compression on.  The CPU monitor shows that the compression is using only one thread/vCPU.
-  One must watch the backup logs carefully for errors.  After the backup fails, vcbGhetto still deletes the oldest backup, and retains the failed backup.  This means that if left unattended, it will rotate out all of your good backups.


Viewing all articles
Browse latest Browse all 232894

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>