We get this question fairly frequently when it comes to slow network connections.
The performance of directory listings (especially on a laggy network) are limited by the .NET APIs we call to retrieve the directory information. There are two limitations to the current set of APIs:
Forced Retrieval of Attributes
When we do a directory listing, we show the standard attributes of the file or directory: Mode, LastWriteTime, Length, and Name. The core Windows API is highly optimized for this basic scenario, and returns these attributes by default along with the rest of the file information. However, the .NET Framework doesn’t take advantage of this data, and instead goes back to the network location to ask for all of the file attributes. This chatty behaviour adds a handful of network round trips for each file or directory, making the directory listing many times slower: hundreds or thousands of times slower in many cases. The Framework team addressed this as part of .NET 4.0, and you’ll see the benefits of this new feature as soon as we are able to adopt it.
In version two, even without the benefit of the new .NET API, though, you’ll see a huge improvement in wildcarded directory listings (both local and remote.)
As a background, PowerShell wildcards are different than straight cmd.exe wildcards. For example, PowerShell wildcards do not match the 8.3 short file name, while the native filesystem filtering (exposed by cmd.exe wildcards) do. PowerShell’s wildcards support character ranges, while the native file system filtering support does not. Because of this, PowerShell wildcard processing happens AFTER we’ve retrieved all of the files.
This comes at a cost, however. Native file system filtering (as exposed by the –Filter parameter) is MUCH faster, as its processing is wired into the Windows file system.
In version two, we did a bunch of work to resolve this strain. When you provide a PowerShell wildcard, we convert as much of it as possible to a native filesystem filter, and then apply our wildcarding logic to the much smaller set of results. You’ve probably noticed this most in tab completion, but it makes huge improvements in regular wildcarded directory listings. Especially remote ones. Since the native filtering is processed by the remote file system, we don’t need to suffer the performance penalty of accessing attributes of files that you ultimately don’t care about anyways. In version one, you can work around the issue by specifying the –Filter parameter directly. If this still doesn’t provide the speed you need, you can call “cmd.exe /c dir”.
Lack of Enumeration API
This issue raises itself for directory listings that contain many files. The DirectoryInfo.GetFiles() method returns an array. When creating that result list, the .NET Framework does many re-allocations (and copies) of that array, causing an exponential performance degradation:
This, too, has been resolved in the .NET 4.0 updates, by offering an API that lets you enumerate through a directory result, rather than retrieve them all at once. If you are running into these limitations, you can again apply a wildcarding approach. If this still doesn’t provide the speed you need, you can call “cmd.exe /c <command>”.
Why Don’t We Fix It?
Since cmd.exe isn’t impacted by these issues, why don’t we just do the same thing and call into the core Windows APIs directly? The reason is twofold:
- A core tenet of PowerShell is providing access to the REAL underlying .NET objects. If we implemented the semantics ourselves, we’d have to return new types of objects – something like PSFileInfo and PSDirectoryInfo. V1 scripts (or downstream cmdlets) that expect the REAL underlying .NET objects would fail to work. While we could add a new switch (-Raw?), users would still have to change their scripts to support it. In that case, they might as well use the existing cmd /c workaround.
- This issue is ultimately transient. While it’s annoying to drag out over a few years, it will ultimately come and go without users having to change their behaviour. One day, you’ll install a build and the issues will just magically be gone.
Again, thanks for your continuing feedback. That’s what ultimately helped us discover the issue and make sure the right people knew about it.
Lee Holmes [MSFT]
Windows PowerShell Development
Microsoft Corporation
0 comments