Rethinking findstr with F# and Powershell


As a software engineer, I am frequently searching my projects’ source trees for various code snippets, or searching a collection of log files for a particular message, or some other type of text searching activity.  The traditional Windows utility for such things is findstr, but alas, it leaves much to be desired.  Among other things, the regex support is kind of funky and it’s a pain to parse and process the output in batch script.

Windows Powershell makes things easier with the cmdlet Select-String, but this approach is usually not as speedy as trusty findstr.

The natural solution?  Implement our own utility in F#!  In this blog we will develop a Windows Powershell cmdlet in F# which offers both the features and the performance that we need, and we will do it in fewer than 200 lines of code.

These are the parameters we will be able to specify:

  • The pattern to search for
  • A file name or extension filter to control which files are considered in the search
  • Whether  to search recursively from the current directory
  • What file encoding to use
  • Whether to do a case-sensitive or case-insensitive search
  • Whether to do a verbatim plain-text search, rather than a regex search

And these are the pieces of data we will be able to process from the output

  • The file path
  • The line number
  • The full line from the file
  • The substring which matched
  • Regex match groups

Some basic usage samples we will be enabling:

If you aren’t familiar with Powershell, it’s the next-generation (current-generation, really) command-line shell and scripting framework for automating Windows tasks and IT workflows.  Powershell has shipped in-the-box as a default Windows component since Windows 7, and has since attracted a large following of scripters, administrators, and developers.  Powershell v3 is built into Windows 8 and Windows Server 2012.

The primary type of utility in Powershell is not an executable, but a cmdlet.  Powershell is built entirely on .NET, and every scriptable entity is a full-fledged .NET object.  Users have full access to types, properties, methods, events, etc, all from script.  Cmdlets, too, are implemented as .NET classes, inheriting from System.Management.Automation.Cmdlet (or PSCmdlet).

Following standard Powershell naming conventions, we will call our cmdlet “Search-File.”  Cmdlet parameters are implemented as public properties on the cmdlet class and tagged with the [<Parameter>] attribute.  Our cmdlet class thus looks like this (taking advantage of the new F# 3.0 syntax for auto-properties):

Although this tool will be invoked from the command line, we don’t need to do any argument parsing ourselves.  All argument processing will be handled by the Powershell runtime.  When ProcessRecord is called, we can assume all applicable [<Parameter>] properties have been set according to user input.

The cmdlet class is just a simple Powershell interop wrapper around the real workhorse, the FileSearcher class, which we can write in somewhat more idiomatic F# code.  For the objects which are actually returned by the cmdlet, we define the LineMatch class, which exposes all of the properties we are interested in.

To meet our performance goals, we parallelize the entire workflow with F# async and Tasks.  The enumeration of files will be done in an async block, and each discovered file will then be processed in its own dedicated Task.  As matching lines of text are found, LineMatch objects are constructed and dumped into a BlockingCollection, which handles synchronization for us.  The elements of the BlockingCollection are meanwhile streamed back to the user on the calling thread, so results appear as fast as they are found.

Here’s the code:

The F# code is concise and very readable.  Language features such as active patterns, sequence expressions, and matching keep the code fairly tidy compared to equivalent C#.  And of course, we can test the different bits in F# Interactive as we code.

Developers already familiar with F# might be curious why Tasks were used for processing files, rather than F# async expressions.  We needed to fire off file processing jobs immediately as each file path was enumerated, rather than waiting until all files were enumerated, since this could potentially take a long time.  The Task Parallel Library provides great APIs for exactly this kind of parallel processing.  The F# async API, on the other hand, provides an experience geared more toward enabling asynchronous processing, especially the usage of Begin/End .NET APIs.  The difference between parallel and asynchronous is subtle, but significant.  Suffice it to say, Tasks were simply the better tool for the job in this case.  Some good discussion on exactly this topic can be found here.

Compiling these classes into FSUtils.dll, we can now consume them from Powershell command line or script by calling the cmdlet Import-Module.

The output from our “failwith” sample search is a collection of LineMatch objects which look something like this:

The output can be condensed to 1 line per match by either piping to Format-Table, or by defining a format file which specifies exactly how to display objects of the LineMatch type.

Comparing performance against standard Powershell cmdlets, traditional findstr, and GNU grep, we do very well.  Search-File was a bit faster than findstr and grep when searching the F# team source tree for both plaintext and regex patterns.  All three handily beat Select-String.


Where we really see Search-File pull ahead is when chewing through a larger set of files, in this case the C# source tree for another project.  The benefits of parallelization become more pronounced in this case.



There we have it!  In under 200 lines of F#, a highly usable and very speedy file search utility.  I look forward to combining the strengths of F# and Powershell again soon.

Download the full source code here.

Lincoln Atkinson

Visual Studio F# Test Team

PS: The code provided here will compile against .NET 4+.  Powershell v3 supports .NET 4 by default, but in order to import this to Powershell v2 (e.g. on a Windows 7 machine), you will first need to take some manual steps.




Comments are closed.