### A TestNG Listener can register itself with TestNG

The TestNG documentation lists 5 ways of registering a TestNG listener.  Here’s a sixth, that’s sometimes useful:

If your listener is specific to a test suite (it’s not really general purpose) or if you don’t mind dropping the source into each test suite, then relying on the fact that the @Listeners annotation makes the named listener(s) available to the entire suite (not just the one annotated class) you can make your listener itself be a test class!

• Put your listener class in the same place as the rest of the test classes in your suite (or name it in your testng.xml, or whatever).
• Put an @Listeners annotation in front of the class, referring to itself.
• Add a method annotated with @Test inside the listener class.

Now the listener class will be identified as a test class, and it’s @Listener annotation will be respected, and thus it itself will be a listener.

Note that your test method in the listener class can have @Test(enabled=false) (so it doesn’t pollute your results) and it still works!

### WinSvr 2012 R2 hanging due to event 129 from vsmraid: solved (for me)

I’ve been plagued by a problem where after running for 3-4 days (sometimes a longer interval, sometimes shorter) the performance of my Windows Server 2012 R2 system would just tank until rebooted.  The event log (System) would fill with event 129 from driver vsmraid, reading:

Lots of ineffective ideas and proposed solutions are on the web so I’ll point you to what worked for me:  Set AHCI Link Power Management – HIPM/DIPM to “Active”, which disables AHCI link power management.

The problem is apparently that some devices, e.g., certain SSDs, don’t respond properly (or at all?) to Link Power Management commands yet the Intel RAID drivers (or firmware?) apparently insist on sending them LPM commands.

To solve this you first change the registry so that the Power Settings applet shows AHCI Link Power Management options, then you set the option to “Active” which disables it (it means: let the device/link stay active and don’t try to send link power commands to it).  If that works, you win, if not, more drastic surgery is required: You set the registry to totally disable Link Power Management (aka “LPM”) to all devices.  I needed to do that.

Go to this excellent post by Sebastian Foss and follow steps 1 and 2.  Reboot and await results.  If that doesn’t solve your problem then follow step 3, which did it for me. (I didn’t do step 4.)

Here’s some more information: A question with discussion on TechNet, a tutorial with screenshots on how to enable the AHCI LPM power options in the Power Applet, and a SuperUser (StackExchange) discussion of it.  Also an excellent post from the NT Debugging blog explaining storage timeouts and event 129.  It’s only off in one key point: When he sums up, saying “I have never seen software cause an Event ID 129 error.”  Obviously, this post from 2011 predates this Intel LPM problem.

Hasn’t happened for two weeks now, so I’m declaring success.

P.S., here’s the information from Sebastian Foss’ post (linked above) just in case that post disappears:

I had several system freezes in Windows 10 Technical Preview (build 9926 – but I also had those freezes on earlier builds) on my Macbook Air 2013.
System Event-Log shows a warning for ID 129, storahci, Reset to device, \Device\RaidPort0, was issued.

Seems to be some problem related to the SATA-Controller and the SSD (In my case Apple/Samsung SM0128F)

I was able to fix the problem by editing several registry entries:

1. HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Power\ PowerSettings\0012ee47-9041-4b5d-9b77-535fba8b1442\ 0b2d69d7-a2a1-449c-9680-f91c70521c60 and change the “Attributes” key value from 1 (default; hidden) to 2 (exposed). [This will expose “AHCI Link Power Management – HIPM/DIPM” under Hard Disk power settings]

2. HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Power\ PowerSettings\0012ee47-9041-4b5d-9b77-535fba8b1442\dab60367-53fe-4fbc-825e-521d069d2456 and change the “Attributes” key value from 1 (default; hidden) to 2 (exposed). [This will expose “AHCI Link Power Management – Adaptive” under Hard Disk power settings]

Now you can edit AHCI Link Power Management options in your power profiles. You can either set them to “active” – or in my case I set them to HIPM. (Host-initiated) (While DIPM would be a device initiated sata bus power down).
Those settings control the behavior of the sata bus power state – they do not power down the device.

3. HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\storahci\ Parameters\Device
Set NOLPM to * – those keys contain several hardware ID’s (vendor and device) for storage devices. Setting NOLPM to * disables LPM control messages to any storage device.

4. I also set SingleIO to * – never had any freezes or storahci warnings again.

I hope this helps those who have also been looking for a solution for a long time.

BR – Sebastian Foss

### Breaking a Windows command line into separate arguments, respecting quotes and backslashes

I went on a side track recently and discovered the strangely intricate world of breaking a Windows command line into arguments.  That is, how do you do Windows command line lexing?  (By established convention, command line parsing refers to interpreting arguments as options to programs: interpreting flags, collecting file names, handling missing required arguments, etc.)

TL;DR: I wrote a fully tested C# library to do this and it is on github and nuget for your public domain amusement.  (It’s also on symbolsource.org, for your source debugging needs, but I can’t get it to work in my VS2013 environment … let me know if it works for you.)

Most of the time there’s no need to worry about breaking up a command line into arguments.  Your C/C++ program gets them pre-lexed as arguments to main(): the well known argv and argc, handled by your compiler’s runtimes. And your C# program gets a string[] args array, handled by the .NET assembly launcher. And for most occasions, that’s sufficient.

But maybe it isn’t. For example, I was trying to use Clang’s libclang to process some C++ source code. An excellent resource if you want your C++ lexed, parsed, and indexed. But to get it going you’ve got to pass compiler command line arguments to the function which parses a translation unit. Those arguments must include all the include directories, preprocessor symbol definitions, and everything else that you’d ordinarily pass to your compiler (in clang’s case, these are normally gcc’s options). A lot of times these are build into makefile macros or even more difficult to reach locations—like inside of Visual Studio’s project files.

For my purposes I wanted to grab them from MSBuild logfiles so I could get the actual command lines as seen by Visual C++. And that meant, I needed to lex a command line into arguments.

So that turns out to be intricate, as I said above. The key issue is caused by a…unfortunate design choice?…mistake?…that dates back to MS-DOS/PC-DOS 2.0: The use of the backslash as the directory separator character in a path string. Since in C and C-derived languages (and many other languages) the backslash is used as an escape character in a double-quoted string literal, and since paths containing backslashes are often passed as arguments to programs, and since those paths are frequently in double-quoted arguments (to protect blanks and other special characters) there’s a conflict that leads to confusing interactions between quoted arguments and escaped characters and path strings.

In this article on MSDN, Parsing C++ Command Line Arguments, Microsoft describes the rules: note the special cases for even or odd sets of backslashes immediately followed by a double quote character, versus a set of backslashes not so followed. But it’s more complex than that. There is a special rule for backslashes at the end of the string. There is special handling of the first (“zeroth”) argument on the command line: The executable path. The rules changed slightly in 2008. And some programs don’t use Visual C++’s runtime to lex arguments, they use the Windows API CommandLineToArgvW to do it—and wouldn’t you know, it handles things slightly differently.

I ended up writing a C# library that lexed arguments, letting you choose between the Visual C++ way of doing it or the CommandLineToArgvW way of doing it. There are also routines for “requoting” arguments properly so that you can form them back up into a command line. (I haven’t done globbing yet, but that’s coming.) I’ve put it on github (with a public domain license, so party on) and it’s on nuget as well. Bug reports, discussion, praise is all cheerfully accepted at the github site (or as comments here).

Naturally, I didn’t figure out the crafty little details myself. I relied on a reports written by a bunch of people who got there first. And, here are links to that work, which were quite useful to me:

### ReFS disk scrubbing doesn’t play nice with other work on the same disks

[Update: found the way to schedule ReFS disk scrubbing, see the end of the post.]

ReFS has great data integrity features, especially when running it on top of a Windows Storage Spaces resilient volume (e.g., when mirrored).  You can set it to do full file content integrity, which it does by keeping checksums of everything that’s written and then periodically scrubbing the disk and comparing the actual contents read to the expected checksum.  If one mirror reads bad data and the other mirror is correct then ReFS will fix up the bad copy.  This is all great stuff!  Except when it isn’t, of course … (why can’t I have my cake and eat it too?)

Today I was experiencing extremely sucky disk performance and couldn’t figure out why.  (If you must know, μTorrent kept reporting “disk overloaded” even when download/upload speeds were fairly low.)  I remembered that in the past I would have a day or two where I’d have extremely sucky disk performance but it would go away.  This time, it was annoying enough that I wanted to find out what was wrong.

TL;DR version: Periodic disk scrubbing had kicked in and was running full throttle on the disk.

I investigated this way:  First I looked at the Resource Monitor for Disk usage.  It showed that a volume I wasn’t using was having continuous high traffic.  In fact, it showed that System (PID 4) was reading a 70Gb tar file in a directory of 400Gb of tar files that I never ever touch.  (It is an enormous repository of Java/C++ sources I acquired for a project I started and haven’t actually worked on in a long time.)  I then checked Windows Defender:  It wasn’t scanning, and MsMpEng.exe wasn’t using any CPU either.  I don’t know what sparked my thought process, but I finally googled for “ReFS disk integrity” and found a suggestion to check the Event Log Microsoft/Windows/DataIntegrityScan (under Applications and Services Logs).

Sure enough, it showed a scan of my ReFS volume had commenced in the early afternoon and was still going at 9PM.  Looking back in the log just a short way I found the last such scan ran 3 weeks ago and took 40 hours to complete!  (It’s a 4.3Tb volume, striped as well as mirrored; I typically get sustained read speeds of ~170-180Mb/s and Resource Monitor was showing System reading this tar file at around 110Mb/s.) (I also discovered, in the logs, events showing that if the scrub is interrupted by rebooting it continues after boot.  I did reboot a couple of times today in order to fix an issue with my Logitech mouse device driver.  (Don’t ask.) I don’t know if it restarts the scan or continues from where it got interrupted, I presume the latter.)

To be perfectly precise here, my problem may be that I have two volumes running on the same underlying Windows Storage pool, that is, on the same disks.  The ReFS volume and an NTFS volume.  My μTorrent traffic is directed at the NTFS volume (so I can use smaller 4Kb disk clusters which plays better with μTorrent).  It is possible that the scrubber would behave better if the ReFS volume was the only user of the underlying Storage Pool.  (But if that’s the issue, it is rather lame for an otherwise very well implemented feature.)

I can’t find any documentation or blog posts anywhere on the net that explains how to either schedule these scrubs or cause it to throttle itself.

Update: The ReFS disk scrubber runs on a task schedule – see the Task Scheduler under Task Scheduler Library/Microsoft/Windows/Data Integrity Scan. Change the schedule to what will not impact you, or disable it altogether … but remember to run it manually before you get into trouble! I haven’t found a way to throttle so that it can run slowly and steadily without impacting other work on the box.

And after I did this I still had problems with unresponsiveness … and much more often! I tracked that down to “Regular Maintenance”. I can’t tell everything that goes on during “Regular Maintenance” but at least part of it is the defragger, which, on a large volume, is terribly slow. I had to go to Task Scheduler Library/Microsoft/Windows/TaskScheduler and disable Idle Maintenance—because even though I configured it to stop when the computer was no longer idle it just kept going and going and going. Also I changed the schedule on “Regular Maintenance” so it happens a lot less often (like, every other weekend). And finally, I disabled “Maintenance Configurator” because if you let that run it automagically resets your changes to the other maintenance tasks. (I forget where I read about that necessary fix but I wish I did so I could thank the guy.) I wish I knew if there was any “maintenance” I’ve turned off that I’ll miss later …

### Visual C++: Letting C++ programmers get away with new murder without warning

Consider Visual C++.

This always correct code emits a completely useless warning:

This never ever correct code doesn’t emit any warning at all:

Visual C++ is a totally awesome piece of technology.  So why doesn’t it work the way I want it to?

### Failed EFI boot with 0xC000000F and missing winload.efi, running native

Did you just get a Windows boot failure

• on an EFI boot machine
• missing file \Windows\System32\winload.efi
• error code 0xC000000F
• when you are running native (boot from VHD, VHDX)
• and just deleted a differencing disk
• but did not first delete the BCD entry that referred to the differencing disk?

If so … boot from a Windows setup USB stick/DVD/whatever, and use BCDEDIT to delete the boot entry that still refers to the differencing disk. Then you’re good to go.

Apparently with a sufficiently bad entry in the BCD store you get a nasty catastrophic failure and don’t even get a choice to boot from one of the other installed operating systems. But don’t succumb to a heart attack. Correct it by booting from a different device (setting the BIOS boot order if necessary) and deleting the bad entry in the BCD.

(By the way, this superuser/stackoverflow page would have been a real help fixing more “normal” EFI boot problems if I hadn’t borked my machine in a particularly stupid way.)

### Two nearly perfect keyboards: Das Ultimate and WASD CODE

For many years I’ve missed the second best typing experience I’ve ever enjoyed:  The Northgate Omnikey Ultra, pictured here in all its glory:

Oh what a wonderful tactile feel!  My fingers just flew over the full size keys, hitting every key correctly!  And what a wonderful clacky sound it made!  Unfortunately, they had old style PC/AT keyboard connectors, which were inconvenient, and were quite large, and weighed about 50 pounds each, and I stupidly got rid of all 4 of mine some time ago.

Anyway, I now have that same experience again:  I’m the proud owner of both a DAS Model S Ultimate, and a WASD CODE keyboard.  They are equally wonderful in the tactile feel department, and have interesting aesthetic differences.

The DAS Model S Ultimate is gorgeous: it has a beautiful case, seemingly black lacquered, with nice geeky blank keycaps.  I got the “blue” clicky switches, and boy is it loud!  Feels and sounds great, but you can’t use it in an apartment or your neighbors will call the police.  In fact, my wife told me it went or I went.  I retrofitted it with WASD red O-rings and that made it much quieter but didn’t hurt the feel. Has other features too (see the link above) including 2 USB ports on the side—useful for plugging in your Logitech “unifying receiver” for your M570 trackball, and a USB stick as well.  Excellent solid construction: You wouldn’t damage it if you ran over it with a tank.

The CODE keyboard feels great, looks very good, and has wonderful backlighting: the keycaps are black with translucent white letters that glow.  The backlight has adjustable levels.  When I bought it I thought the backlighting was a gimmick but I use it every night … and I really like the look of walking into my darkened office and having the keyboard glowing at me.  I got the clicky (but not really loud) green switches (and they come with sound dampening O-rings installed) so I get a terrific tactile feel but it isn’t terribly loud.  I got the 87-key version without the numeric keypad—which I don’t use—and so it is more compact than the DAS Ultimate, and that is convenient.  Plenty of other features too like Dvorak and Colemak alternate layouts built in, and the very nice ability to totally disable caps lock (turn it into another ctrl key).  Excellent solid construction like the DAS keyboard.

I am in typing bliss!  Wouldn’t trade these in for anything, in fact, I’m looking to get another one or two to take on the job.

But no product is perfect, so here are some issues.

First a minor flaw in the CODE keyboard: A sort of nice feature is that instead of a permanently attached cable hanging off it they use a detachable standard Micro USB cable which plugs in under the keyboard.   There are cable routing channels that you can use to route the cable out the back left/center/right or out the left or right sides of the case.  But if you use the back center channel than any tension on the cable will pull it right out.  So use one of the other four.

And now, an extremely serious flaw that affects both the otherwise wonderful DAS keyboard and the CODE keyboard:  In a major design fail, the function keys F1-F12 are at the top row of the keyboard instead of the left side of the keyboard where God put them.  See them on the left of that picture of the Northgate keyboard above?  Well, admittedly, those guys went overboard, duplicating the function keys on the top as well as the left.  Truly, you only need the ones at the left.  Because then you can easily and accurately touch type the function keys, and do it without horrible stretching.  The original IBM PC keyboards were like that and I’ve never understood why IBM and everyone else abandoned that.  DAS, WASD, are you listening?  Put the function keys on the left and you will have the second best keyboard ever created!

(After all this praise: second best?  And that’s the second time I’ve said it.  Well, yes, the ultimate typing experience is to be found only on IBM Selectrics, specifically, the Selectric Model II.  Those were just beautiful. We’ll never see the likes of them again…)

This is just a reminder:  A bad cable can masquerade as other errors.

In this case I was building a new server system – new motherboard, new drives, etc. keeping only the splendid case.  I had one particular new 4TB HDD that would drop out of a Windows Spaces storage pool (“Retired”, as if it had done some hard work and was now taking a well-earned rest)—sometimes within 10 minutes after booting, sometimes it would take an hour.

The event log before one of these drop outs would show a few bad commands, followed by a “bad block” error.

A S.M.A.R.T. diagnostic utility showed excessive errors in a couple of categories, none of them bad blocks.  One such category was “command timeouts” which was a clue, but I didn’t know how to interpret it.

Anyway, it was a brand new disk.  And I buy high-capacity HDDs all the time (I have nearly 50) yet I’ve never had any die of infant mortality.  So I tried moving the disk to a different port on the same brand-new motherboard controller (bad socket?), moving it to an add-in card controller (of the same type, Marvell) (bad motherboard chip?), moving it to a different port on a different motherboard controller (Intel chipset this time, bad driver?).  Failed each time!  So I ordered a new drive (same-day delivery!) and tried it…and it failed too in the same way!

Wracked my brain…and finally…swapped cables with an adjacent hard drive in the same enclosure.  Now the other drive failed!

So it was the cable.  Replaced the cable and all works fine.  I’ve copied 8Tb of data onto the new Storage Spaces array with no issues now.

I’ve never had a bad (internal case) cable before either.  And these were new cables.

Update 2014-09-09:  And bad network cables can make your PC connect at 100Mbps instead of 1Gbps!  This just happened to a colleague at work.