Monthly Archive for June, 2012

Does enumerating files with FSCTL_ENUM_USN_DATA ever miss any files?

Short answer: No and yes.

I’ve been playing with enumerating all the files on a volume using the FSCTL_ENUM_USN_DATA IOCTL, which reads the MFT.

It is supposed to be very fast, especially compared with other methods such as FindFirst/FindNext.  (There’s a great forum posting on it here, “Reading MFT, which also contains links to posted source code here, and several posts by the same author here, all of which together make a great starting point.)

However, a question is quickly raised when you’re trying this out:  Does the enumerating files through the MFT this way miss any?  If you try a different traversal technique, e.g., using the .NET Framework APIs Directory.GetDirectories() and Directory.GetFiles() you’ll get a different number of results.

In fact, on my system (running Windows Server 2008 R2) I found 1419 directories and files with the MFT enumeration that weren’t listed with the .NET API traversal, and an astonishing (to me) 19199 extra directories and files with the .NET API traversal than with the MFT traversal.  What’s going on with all those “missed” files?  Did I have a bug in my MFT enumeration code?

No.  The answer is: Hard links.  (And symbolic links.)

When scanning the MFT with FSCTL_ENUM_USN_DATA you see each directory and file once and only once, no matter how many directory entries point to it.  For example, on my system, traversing C: with the .NET APIs returns 6 files named “write.exe”, but the MFT enumeration has only 2.

In fact, by using the command “fsutil hardlink list c:Windowswrite.exe” I see that that single file has four names:

(The other two instances of “write.exe” are a single separate file that has two links to it.)

I had no idea that in a standard installation there were so many hard links used. In fact, it even seems that some application installers create multiple hard links to the same file (e.g., MiKTeK).

And that explains nearly all of the files “missed” by the MFT enumeration.

Depending on the reason that you’re enumerating directories and files on a volume, this may or may not be an issue for you.  Actually, likely, it is an issue for you and you may need to resolve it by traversing the directory using FindFirst/FindNext or the .NET APIs and reconciling the two collections.  (Given a filename you can use the Find{First/Next}FileName functions to get all the names of a given file, i.e., all of the names of the hard links to the file. But it may be expensive to use this on every file just to find the ones that have multiple links.  Reconciling with the other kind of traversal may be the better bet – I have yet to measure this.)

On the other hand MFT enumeration does find files that the .NET traversal does not. There are a large number of files under WindowsSystem32 that are returned on the MFT enumeration but not the .NET traversal.  I’m not sure why; it doesn’t appear to be security related.  The .NET enumeration won’t descend past reparse points like “Documents and Settings”, but MFT enumeration won’t descend into directory mount points (FindFirst/FindNext will go through mount points, and I haven’t tried .NET enumeration on that yet.) The MFT enumeration also returns the directory “System Volume Information” and some files under it.  And it also returns directories and files related to the Transactional Resource Manager, namely the directory “$RmMetadata” and its contents.

(The latter is the cause of some minor coding confusion:  The MFT metadata files, e.g., $Bitmap and $Quota, are not returned by the MFT enumeration—and that includes the directory $Extend, which is the parent of $RmMetadata.  So when you’re assembling path names from the entries returned in your MFT enumeration you’ll have to account for the fact that $RmMetadata’s parent isn’t going to be in your collection of directories.)