hash windows files against known good set

May 3, 2012

Let’s say you wanted to hash windows files against a known good set of hashes.

Here’s how to do it!

Required Tools

You’ll also need a server to query against.  Luckily Kyrus has provided a nsrlserver (beta), known as the Kyrus NSRL Lookup Service!


What’s nsrlquery?

nsrlquery is an umbrella project that’s home to two separate, distinct subprojects:nsrlsvr, which provides a server that yields nsrl rds information on request, andnsrllookup, a simple command-line application that queries the server. The server is unix-only, but the client runs just fine on Windows.

But wait, what’s the nsrl rds and why is it important? Glad you asked!

The National Institute of Standards and Technology (nist) hosts the National Software Reference Library (nsrl). This is a set of millions of applications, libraries, common configuration files and every other thing imaginable that gets stored on a hard drive. As part of the nsrl, they’ve also published sha-1 and md5 hashes of everything in the nsrl. This list of hashes is called the Reference Data Set (rds).

Many digital investigations are plagued by a needle and a haystack problem: out of terabytes of data the investigator may only be interested in a small fraction. One of the most important tasks in digital forensics is winnowing out what might be wheat from what is overwhelmingly likely to be chaff. Many forensics tools, such as md5deep, can vet the hashes they create against a known-good list — but these tools are often ill-suited to make use of the rds, which is well over a gigabyte. Loading up a gigabyte of data every time one wishes to use md5deep is just not practical: a more pragmatic approach was needed.

How do I install it?

nsrllookup is just a ./configure && make && make install dance, like any other well-mannered application. For the Windows binaries it’s even easier: just drop the executable somewhere on your PATH and start having fun.

nsrlsvr requires a little more work. Read the included INSTALL file carefully.

How do I use it?

Once the server is built, starting it is as simple as launching it from the command line. Alternately, since it’s a well-behaving unix daemon it can be easily integrated into your particular unix’s daemon management system (launchctl, /etc/init.d, etc.).

Using the lookup tool is as simple as:

$ md5deep -r /path/to/mounted/disk | nsrllookup

It will print a list of all files that miss the nsrl rds. You may invert the behavior (only listing hits) with the -k flag. Alternately, if you need to generate both hits and misses in a single pass, use both the -K and -U flags:

$ md5deep -r /path/to/mounted/disk | nsrllookup -K KNOWN -U UNKNOWN

Once it finishes, the file KNOWN will contain hits (hashes known to the nsrl rds) and the file UNKNOWN will contain misses (hashes unknown).

Why should I trust it?

Right now you probably shouldn’t trust it — at least not without doing your own checks on its operation in order to ensure that it’s working correctly enough for you!

Although these tools are in use by real people doing real investigative tasks, that’s a pretty lousy reason to trust a piece of software. Real trust comes from having a codebase that’s small enough to read, well-written enough to be clear, and documented enough to accurately guide you through the code as you make your own decision of whether it’s trustworthy.

nsrlsvr is in the neighborhood of a thousand lines of well-written C++ code. It defines a grand total of one custom object which amounts to maybe twenty lines. Everything else is written in a very C-like dialect of C++ for ease of auditing, although it makes heavy use of C++’s superior memory management facilities, built-in data structures, and file I/O. Read it. I don’t think you’ll be disappointed!

nsrllookup is slightly smaller, but still in the neighborhood of a thousand lines of well-written C++ code. Like the server, the code is readable. Read it, and make your own decision about whether to trust it.


Further Details – Using the Kyrus Server

NIST on the Case

NIST is kind enough to distribute the National Software Reference Library (NSRL). A collection of hashes of known software (usually provided by vendors themselves), it contains over 78 million hashes, 21 million of which are unique. It is arguably the best resource almost no one uses because 78 million is enough hashes to choke almost any forensics tool you’re using. Go ahead and try it. We’ll wait . . .

One Step Beyond

How to use the Beta Kyrus NSRL Lookup Service (NSRLookup), which is based on the hard work of Rob Hansen at Red Jack, who coded nsrlquery. You can download a Windows binary or the *nix source code at SourceForge. Once installed, you can use the output of Jesse Kornblum’s md5deep to query the server. The output can be piped directly:

C:\> md5deep -r * | nsrllookup -s nsrl.kyr.us

. . . or a saved file can be used

C:\> md5deep -r * > known.txt
C:\> nsrllookup -s nsrl.kyr.us < known.txt By default the server responds with the hashes of unknown files. You can get the hashes of known files by adding the -k flag, like this: C:\> nsrllookup -s nsrl.kyr.us -k < known.txt To see a help screen and a list of all command line options, use the -h flag: C:\> nsrllookup -h
nsrllookup for Windows version 1.0.6-1
nsrllookup [-hvukx] [-U FILE] [-K FILE] [-s SERVER] [-p PORT]
-h: display this help message
-v: display version information
-u: show only unknown hashes (default)
-k: show only known hashes
-U FILE: write unknown hashes to FILE
-K FILE: write known hashes to FILE
-s SERVER: connect to a specified nsrlquery server
-p PORT: connect on a specified port

 More Examples

Courtesy of Kyrus