ReFS on Windows 8.1/Server 2012 R2 and “ERROR 665” “The requested operation could not be completed due to a file system limitation”

ReFS on Windows Server 2012 R2/Windows 8.1 newly allows named streams (aka ADS) but only up to a limit of 128Kbytes.  If you copy, to an ReFS volume, a file with a named stream over this size limit you will get ERROR 665 (0x00000299): The requested operation could not be completed due to a file system limitation.

I discovered this copying 2.3M+ files from backups to a new ReFS volume on a system running Server 2012 R2.  All but 5 files copied without error.  The five which failed, with error 665, were all IE “favorites” (i.e., dinky files with extension “.url” formatted like an INI file).  Nothing funky-looking in their names (like odd Unicode characters, not that that should have mattered) or filepath length (not that that should matter either, for ReFS).  Took me awhile to figure it out—as of this writing there are no useful Google hits for this error number or message with the string “ReFS”—and also I believed that ReFS didn’t support named streams.  (But that limitation was lifted in Windows 8.1.)

Anyway, it turns out IE puts favicons in named streams and some of them are over 128Kb in length!  In my case, 5 out of thousands.

Since the Windows’ CopyFileEx and similar APIs copy named streams transparently the error message you receive from applications will have the file name, but not a stream name.

So this is one way to get a mysterious Error 665 “The requested operation could not be completed due to a file system limitation” when copying files to/restoring from backup to an ReFS file system.

(P.S.: Microsoft TechNet documentation on named streams on ReFS in Windows Server 2012 R2/Windows 8.1.)

Windows Server Storage Spaces: striped and mirrored with tiering requires 4 SSDs

I am building a new server and moving to Windows Storage Spaces with tiering (a nice Windows Server 2012 R2 feature).  The documentation is unclear (to me) and various web pages—in the nature of tutorials on how to set up tiering—said that you needed the same number of SSDs as “columns” in your virtual disk configuration.  Other documentation/web pages referred to “columns” as the stripe set size (for example, here) and even the PowerShell cmdlet argument names lined up with that.

But it turns out that for tiering you need columns × datacopies SSDs.  So if you want a RAID 10 (though of course Microsoft doesn’t call it that) where you’re striped (columns = 2)  and mirrored (number of data copies = 2) you need 4 SSDs, not 2.

Oh and by the way, when I was unable to create this kind of virtual disk (via PowerShell, you can’t do it anyhow via the New Virtual Disk wizard) the error message was somewhat unhelpful:  It certainly told me I didn’t have enough physical disks to complete the operation, but forgot to tell me which tier (SSD or HDD) was the source of the problem!

(So … I immediately ordered another 2 SSDs and, of necessity, an add-in SATA controller card (because I had already run out of motherboard SATA ports). I think I’m just going to use a tip I found somewhere on the web (don’t remember where or I’d provide a link) and just velcro my 4 SSDs together and lay them on the bottom of the case.)

(Also, FYI, I’m using 4x 64Gb SSDs and 4x 4Tb HDDs, and the ReFS file system.  Without going to the trouble of measuring performance I’m just going to go ahead and specify a write-back cache size of 20Gb, overruling the default 1Gb, because I’m going to be copying a lot of  large VHDs around and I’d rather have the copy complete quickly and then trickle onto the HDD in its own good time, than wait for it.  So I hope this works.)

Update: I did get this working.  And, as I finally configured it performance is fine: great read performance and good write performance.  I suppose I could get better performance (5%? 15%?) from a hardware RAID controller but I don’t need the last bit of oomph and I don’t want to be tied to a particular hardware RAID manufacturer.  So I’m happy with the way Windows Storage Spaces is working here.  However – n.b.: Write performance totally sucked (*) until I figured out that I needed to set the virtual disk stripe size to match the expected file system cluster size.  Thus, with 2 stripe sets, I set the interleave to 32Kb on the virtual disk I created to hold an ReFS file system (which always has a 64Kb cluster size) and to 16Kb on the virtual disk I created to hold an NTFS file system I created with 32Kb cluster size.

(*) Factor of 8 to 10!!

Biggest mistake in C#: That strings can be null

I really like C#.  It could be better by adding lots of my favorite things … but as it stands it is very useable, very expressive, very readable.  And it has only one major mistake (IMO):  Strings (variables, parameters, fields, etc.) can be null.

Oh my, how many coding errors have been made by forgetting strings could be null?  How many crashes have users suffered?  Oh well.

Anyway, here’s a brief proposal on how to correct the problem.  It isn’t carefully thought through … just off the cuff as it were.  But:

Let there be a unary operator that, when applied to a typed null value (something that isn’t dynamic) acts like this:  if the value is not null then the operator returns that value unchanged; if the value is null then the operator returns the result of calling the no-parameter constructor for the type of the value.  (Where the type of the value is whatever the compiler things it is using standard type inference, where it’s an error if the type doesn’t have a no-parameter constructor, etc.)

(For the sake of argument, assume the operator symbol is a postfixed exclamation point.)

Then you could easily (single character!) coerce any null value to a default constructed value of the proper type.  It would be easy to insert the operator at return statements, or after a method call where you weren’t sure if a null value might be returned, or on the use of a parameter.

And then, the next step is to allow that operator to be used in three more places: After the declared return type of a method, after the declared parameter type of any method parameter, and after the declared type of a property.  (It would work with generic parameter and return types too, if the generic type had the new() constraint.)  This annotation would mean that the compiler would automatically apply the operator at each return statement, and on each annotated parameter on method entry.  The annotation would also be a simple and easily understood way to communicate to the programmer the guarantee that the method never returned null and that, inside the method, the parameters would never be null.

And, with those annotations in place, if you went ahead and modified the IL to incorporate the annotations (rather than just having it as a C# compiler implemented feature) then the JITter could perform flow analysis (of whatever complexity) and probably eliminate a bunch of explicit invocations of the operator.

The final step:  Annotate the .NET framework with the operator where appropriate, which would be practically everywhere.

Well … your input?  Good idea, I’ve neglected a major flaw, or what?

 

Breadth-First Traversal with Alternating Directions at Each Level

Yesterday I discussed the interview coding problem I blew: Code, in Java, a breadth-first traversal of a binary tree, alternating directions at each level.

TestTree

The key observation is obvious: At each level you process nodes in the reverse order of the previous level.  (Duh! That’s the problem statement.)  To turn it around: At each level you must preserve for later processing the child nodes in reverse order of the nodes you’re processing at this level.  A data structure which let’s you pull off nodes in reverse order in which you saved them is a stack, one of the simplest data structures of all. Continue reading ‘Breadth-First Traversal with Alternating Directions at Each Level’

Two coding interview questions done badly, for different reasons

In my recent job search I was asked very few coding questions.  Not sure why not—I was interviewing for individual contributor roles as per usual.  Anyway, two coding questions stand out for basic errors I made. One was at the whiteboard, one was a “homework” problem.

The whiteboard problem had a very simple statement: Code, in Java, a breadth-first search, but switch the traversal direction at each level.  A nice twist on a typical problem!  I did it by inserting a sentinel into the standard queue of pending nodes.  Using a Java Queue<Object> I inserted an (arbitrary) Integer as the flag, but it would have been just as reasonable to use a Queue<Node> and insert the tree root as sentinel.  Remember to check for termination: if you’ve just dequeued the sentinel then if the queue size is zero, you’re done.  Very short and sweet—but wrong! It hit me like a ton of bricks as soon as I drove off!  It doesn’t traverse alternate levels in different directions, but only the nodes of alternate levels in different directions.  Bungled!  The error, my fault, was that I didn’t fully work things out, with an example, before I started coding.  Had I, I would immediately have caught my error.  In my defense, the interviewer and I were under very tight time constraints-it was a screening interview, it started late (scheduling mishap at their end), and we did a bunch of design questions first.  On the other hand, he seemed very happy with the solution: he complimented it and said I would be invited back for a full loop (though I bailed on that for unrelated reasons).  (Anyway, the correct solution. which immediately followed my shocking realization, is here.)

Now, what is it that you should never do with an interview homework problem? Continue reading ‘Two coding interview questions done badly, for different reasons’

Review of “An Introduction to Modern Mathematical Computing with Mathematica” (Borwein, Skerritt)

An Introduction to Modern Mathematical Computing with Mathematica” by Borwein and Skerritt, aims to show how you can improve your understanding of mathematics by experimenting in a Computer Algebra System (CAS). It uses Mathematica (there’s a Maple version available). The book picks three areas of (lower division college) math to work with – Elementary number theory (including tastes of the Fibonacci sequence, perfect numbers and amicable pairs, continued fractions, and the sieve of Eratosthenes), Calculus (with tastes of limits, differentiation, integration, differential equations, and surfaces and volumes of rotation), and linear algebra – and uses a tutorial style to demonstrate how to use a CAS to perform experiments, to compute things in multiple ways to enhance your understanding of the relationships between different mathematical ideas, and to discover edge cases where theorems – or your intuitions – might not work.

Along the way it teaches the minimum you need to know about Mathematica to make progress. The goal isn’t to teach Mathematica as a programming language but as an adjunct to mathematical study. What is covered includes basic syntax, defining functions, conditionals and loops, and some Mathematica-specific features (not common in mainstream programming languages) like pattern matching. Different kinds of basic plotting functions are also described and used. At all times the reader is encouraged to look for more information in the Mathematica documentation – and given search terms to help with that.

The section on elementary number theory is where the Mathematica syntax and semantics is introduced, including loops, functions, and numerical versus symbolic evaluation. The calculus section emphasizes the Mathematica plotting functions. Finally, the linear algebra section adds matrices and more on symbolic evaluation (and could really have used some plotting examples: plotting (or better, animation) would have been very effective to show what eigenvalues and eigenvectors are and how they really work).

Each section ends with a number of (simple) exercises to help ensure that you’ve learned both the math and the Mathematica covered in the section. And, each section also has a set of “Further Explorations” which are related mathematical questions that are intended to be used as a launching point to learn something using the techniques explained in the book (though the further explorations in the linear algebra section are rather slim).

There’s a final section on “visualization” and geometry: A few more Mathematica features related to plotting, and a suggestion that you can use other software tools – besides a CAS – for the same purposes: to visualize mathematics and gain understanding. They use the dynamic geometry program Cinderella as the example software here, providing two simple and short investigations in plane geometry. (I appreciated this as Cinderella happens to be my dynamic geometry program of choice – having excellent built-in support for non-Euclidean geometry, also affine and projective geometry, also simple physics simulations, and is fully programmable besides.)

This book really meets its objective of showing the reader how to explore mathematics using a computer algebra system, while at the same time giving a brief introduction to a specific CAS, Mathematica. It is well written. I highly recommend it as an introduction to this great modern way of learning mathematics. To further your practice in both Mathematica itself and in using Mathematica to explore mathematics, I also highly recommend Mathematica in Action, 3rd edition, by Stan Wagon. It has much more mathematics, and much more Mathematica.

(I do have to point out that the book has a lot of typos. In fact, I should stop here. The typos (largely in Mathematica code) are easily fixed by the reader and don’t detract from the contents of the book. That’s why I give this book 5 stars even though it has typos. So you can stop reading here unless you’d like to help me work off my pique.

When I say this book has typos: I mean, it has plenty of typos. I marked 50 in the first 100 pages then gave up. Most are trivial errors in Mathematica code, e.g., missing commas or misplaced braces, that are easily seen by the reader or discovered as soon as you type the code into Mathematica, and in these cases the fixes are obvious so there’s no problem. In some places the typo isn’t so obvious but you can tell from context what’s wrong. Elsewhere there are copy-and-paste errors where two Mathematica examples, supposedly different ways to do something, are in fact identical. Two of the plots, if you actually execute their code, produce slightly different results from the (correct, intended) picture in the book. Several times an equation being discussed has different variables in the text (e.g., “a”) than in the code sample (e.g., “x”), or even in two consecutive sentences in the text! In a couple of places in the text , the variable is in one place in a sentence the Greek letter θ and in another place in the same sentence the word “theta”.

In one remarkable type (page 101) the discussion in the book, having previously encouraged the use of the online Inverse Symbolic Calculator, wants to make the point that you shouldn’t rely on the results from the ISC: you use them as a suggestion, and validate them some other way. This is a good point, but the example it gives is totally bollixed up. The authors show a calculation of a particular integral giving a specific 10-digit numerical value. Then they look it up using the ISC and get a suggestion for a formula for this value that matches to 10 digits. Then they say that if you actually calculated the integral beyond 10 digits it becomes obvious that the value of the integral is not the value of the formula. The problem is: The numerical value they give for the integral has its last two digits transposed. And THAT transposed value is the one they looked up in the ISC (which I determined by plugging the formula into Mathematica and getting its numeric value to 10 digits). So no wonder it didn’t match! The point of the paragraph is still valid; the example was broken, because of a bad copy of the value from Mathematica into the text!

Why am I harping on these typos so much when I say they don’t really affect the content of the book, which his still excellent? Pique! It is hard to write a book, I know. So hard, that even though I’d like to sometime write a technical book of my own I haven’t done it yet and doubt I ever will. I have a lot of respect for book authors for writing books. One of the authors of this book, Borwein, has written 10 books (and hundreds of papers). What is so difficult for me to do is apparently so easy for him that he can just write ‘em and publish ‘em and get on to the next thing, without even worrying about the details. Couldn’t he have found an undergraduate who would have proofread the book, typing the expressions into Mathematica, for $100, or even just brownie points? Where’s the pride? See: it is just my pique getting scratched in these last 4 paragraphs. Don’t let it stop you from learning from this very good book.)

‘typename’ is not always a substitute for ‘class’ in a template parameter

You know the C++ rule that says that the keyword ‘typename’ can be used interchangably with ‘class’ in a template parameter?  Not always!  Not for the keyword ‘class’ that appears after the template parameter of a template template parameter!

This is a syntax error:

You must write it like this:

Why is that?  Jonathan Caves explains here that the rule ‘typename’ can be used interchangably with ‘class’ is a semantic rule, but the use of ‘class’ in the template template parameter is a syntax rule, and thus takes precedence.

Oversight? Intentional?

By the way, Visual C++ 2012 gives a really unhelpful pair of error messages for this, one of which includes an internal grammar rule name—and that is really non-optimal for a compiler error message:

When is a newline required to separate command line arguments? When it is in your eclipse.ini!

This cost me two full days—very embarrassing in front of my new colleagues.  I set up a new Eclipse installation (I am an Eclipse newbie) following their detailed instructions, and tried to load their very large project.  Eclipse kept crashing, after 10 to 30 seconds, with OutOfMemory PermGen errors. The solution, as any one of 100 blog posts will tell you, is to add “-XX:MaxPermSize=512m” to your eclipse.ini. Which I did, to no effect. I tried different values for MaxPermSize, multiple different versions of Eclipse, made sure I was running the proper Sun JVM, etc. etc. etc.  Finally hooked up jconsole and discovered that I was getting these OutOfMemory PermGen errors when PermGen was nowhere near filling my expanded space limit.

Now, note:  ps showed that the vm was started with the right command line parameter, -XX:MaxPermSize=512m. Eclipse’s own error dialog showed it after a crash. jconsole showed it. But it was being ignored!?!

The solution was this: I had, in the interest of making eclipse.ini easier to read, reformatted it so instead of saying, e.g.,

--launcher.defaultAction
openFile
-vmargs
-Dosgi.requiredJavaVersion=1.5
-XX:MaxPermSize=512m
-Xms40m
-Xmx512m

it was instead:

--launcher.defaultAction openFile
-vmargs -Dosgi.requiredJavaVersion=1.5 ⏎
     -XX:MaxPermSize=512m ⏎
     -Xms40m -Xmx512m

And that was the problem!  Putting the parameters to the command line arguments on the same line caused them not to be recognized!

So: don’t do that!