Monthly Archive for May, 2014

Bad cables can masquerade as other errors

This is just a reminder:  A bad cable can masquerade as other errors.

In this case I was building a new server system – new motherboard, new drives, etc. keeping only the splendid case.  I had one particular new 4TB HDD that would drop out of a Windows Spaces storage pool (“Retired”, as if it had done some hard work and was now taking a well-earned rest)—sometimes within 10 minutes after booting, sometimes it would take an hour.

The event log before one of these drop outs would show a few bad commands, followed by a “bad block” error.

A S.M.A.R.T. diagnostic utility showed excessive errors in a couple of categories, none of them bad blocks.  One such category was “command timeouts” which was a clue, but I didn’t know how to interpret it.

Anyway, it was a brand new disk.  And I buy high-capacity HDDs all the time (I have nearly 50) yet I’ve never had any die of infant mortality.  So I tried moving the disk to a different port on the same brand-new motherboard controller (bad socket?), moving it to an add-in card controller (of the same type, Marvell) (bad motherboard chip?), moving it to a different port on a different motherboard controller (Intel chipset this time, bad driver?).  Failed each time!  So I ordered a new drive (same-day delivery!) and tried it…and it failed too in the same way!

Wracked my brain…and finally…swapped cables with an adjacent hard drive in the same enclosure.  Now the other drive failed!

So it was the cable.  Replaced the cable and all works fine.  I’ve copied 8Tb of data onto the new Storage Spaces array with no issues now.

I’ve never had a bad (internal case) cable before either.  And these were new cables.

Update 2014-09-09:  And bad network cables can make your PC connect at 100Mbps instead of 1Gbps!  This just happened to a colleague at work.

ReFS on Windows 8.1/Server 2012 R2 and “ERROR 665” “The requested operation could not be completed due to a file system limitation”

ReFS on Windows Server 2012 R2/Windows 8.1 newly allows named streams (aka ADS) but only up to a limit of 128Kbytes.  If you copy, to an ReFS volume, a file with a named stream over this size limit you will get ERROR 665 (0x00000299): The requested operation could not be completed due to a file system limitation.

I discovered this copying 2.3M+ files from backups to a new ReFS volume on a system running Server 2012 R2.  All but 5 files copied without error.  The five which failed, with error 665, were all IE “favorites” (i.e., dinky files with extension “.url” formatted like an INI file).  Nothing funky-looking in their names (like odd Unicode characters, not that that should have mattered) or filepath length (not that that should matter either, for ReFS).  Took me awhile to figure it out—as of this writing there are no useful Google hits for this error number or message with the string “ReFS”—and also I believed that ReFS didn’t support named streams.  (But that limitation was lifted in Windows 8.1.)

Anyway, it turns out IE puts favicons in named streams and some of them are over 128Kb in length!  In my case, 5 out of thousands.

Since the Windows’ CopyFileEx and similar APIs copy named streams transparently the error message you receive from applications will have the file name, but not a stream name.

So this is one way to get a mysterious Error 665 “The requested operation could not be completed due to a file system limitation” when copying files to/restoring from backup to an ReFS file system.

(P.S.: Microsoft TechNet documentation on named streams on ReFS in Windows Server 2012 R2/Windows 8.1.)

Windows Server Storage Spaces: striped and mirrored with tiering requires 4 SSDs

I am building a new server and moving to Windows Storage Spaces with tiering (a nice Windows Server 2012 R2 feature).  The documentation is unclear (to me) and various web pages—in the nature of tutorials on how to set up tiering—said that you needed the same number of SSDs as “columns” in your virtual disk configuration.  Other documentation/web pages referred to “columns” as the stripe set size (for example, here) and even the PowerShell cmdlet argument names lined up with that.

But it turns out that for tiering you need columns × datacopies SSDs.  So if you want a RAID 10 (though of course Microsoft doesn’t call it that) where you’re striped (columns = 2)  and mirrored (number of data copies = 2) you need 4 SSDs, not 2.

Oh and by the way, when I was unable to create this kind of virtual disk (via PowerShell, you can’t do it anyhow via the New Virtual Disk wizard) the error message was somewhat unhelpful:  It certainly told me I didn’t have enough physical disks to complete the operation, but forgot to tell me which tier (SSD or HDD) was the source of the problem!

(So … I immediately ordered another 2 SSDs and, of necessity, an add-in SATA controller card (because I had already run out of motherboard SATA ports). I think I’m just going to use a tip I found somewhere on the web (don’t remember where or I’d provide a link) and just velcro my 4 SSDs together and lay them on the bottom of the case.)

(Also, FYI, I’m using 4x 64Gb SSDs and 4x 4Tb HDDs, and the ReFS file system.  Without going to the trouble of measuring performance I’m just going to go ahead and specify a write-back cache size of 20Gb, overruling the default 1Gb, because I’m going to be copying a lot of  large VHDs around and I’d rather have the copy complete quickly and then trickle onto the HDD in its own good time, than wait for it.  So I hope this works.)

Update: I did get this working.  And, as I finally configured it performance is fine: great read performance and good write performance.  I suppose I could get better performance (5%? 15%?) from a hardware RAID controller but I don’t need the last bit of oomph and I don’t want to be tied to a particular hardware RAID manufacturer.  So I’m happy with the way Windows Storage Spaces is working here.  However – n.b.: Write performance totally sucked (*) until I figured out that I needed to set the virtual disk stripe size to match the expected file system cluster size.  Thus, with 2 stripe sets, I set the interleave to 32Kb on the virtual disk I created to hold an ReFS file system (which always has a 64Kb cluster size) and to 16Kb on the virtual disk I created to hold an NTFS file system I created with 32Kb cluster size.

(*) Factor of 8 to 10!!