Hey Scripting Guy! I am not sure why you are so excited about Windows PowerShell. It is much slower than VBScript. I am not talking about a little slow; I am talking about orders of magnitude slower. Sure it has some nice cmdlets, but the performance is sucking fumes.
— LM
Hello LM,
It is no secret that Windows PowerShell 1.0 was a bit slow. But then VBScript was slow too when compared to C++. Windows PowerShell is designed as an easy-to-use administrative tool, and should not be used to write scripts that control life-support systems for space shuttles. Windows PowerShell 2.0 has undergone a complete performance review and is much faster loading, shutting down, and running scripts. In some cases, certain types of code constructions are orders of magnitude faster. But regardless of the language or version of the language, if the code is not written to take advantage of the language features, the performance will be slow.
A common mistake some people make is using Windows PowerShell as if it was another scripting language. One of the most powerful features of Windows PowerShell is the pipeline, and if you do not take advantage of the pipeline, you are setting yourself up for disappointing results.
The Get-ModifiedFiles.ps1 script is used to count the number of files that have been modified in a folder within a specified period of time. The Param keyword is used to create two command-line parameters. The first parameter is the path parameter that specifies the folder to search. The second parameter is the days parameter, which is used to create the starting date for counting modified files. The Param section is seen here:
Param(
$path = “C:data”,
$days = 30
) #end param
The starting date needs to be a datetime object. The Get-Date cmdlet creates an instance of a datetime object, which exposes the AddDays method. By using a negative number for the number of days to be added to the current datetime object, a point in time from the past is created. By default the script creates a date modified object 30 days in the past. This is seen here:
$dteModified= (Get-Date).AddDays(-$days)
The Get-ChildItem cmdlet is used to obtain a collection of all the files and folders in the path that is specified by the $path variable. The recurse switched parameter is used to tell the Get-ChildItem cmdlet to burrow down into all the subfolders. This collection of files and folders is stored in the $files variable. This is seen here:
$files = Get-ChildItem -path $path –recurse
To walk through the collection of files and folders, the foreach statement is used. The variable $file is used as the enumerator, which keeps track of the current position in the collection. The collection of files and folders is stored in the $files variable. Inside the foreach loop, the if statement is used to evaluate the datetime object that is retrieved from the LastWriteTime property of the file object. If the value stored in the LastWriteTime property is greater than or equal to the datetime value stored in the $dteModified variable, the value of the $changedFiles variable is incremented by one. This is seen here:
Foreach($file in $files)
{
if($file.LastWriteTime -ge $dteModified)
{ $changedFiles ++ }
}
The last step that the Get-ModifiedFiles.ps1 script does is to display a message to the user that tells how many modified files are found. This is the command that is used to display the confirmation message to the user:
“The $path has $changedFiles modified files since $dteModified”
The complete Get-ModifiedFiles.ps1 script is seen here.
Get-ModifiedFiles.ps1
Param(
$path = “D:”,
$days = 30
) #end param
$dteModified= (Get-Date).AddDays(-$days)
$files = Get-ChildItem -path $path -recurse
Foreach($file in $files)
{
if($file.LastWriteTime -ge $dteModified)
{ $changedFiles ++ }
}
“The $path has $changedFiles modified files since $dteModified”
When the Get-ModifiedFiles.ps1 script is run, it takes a little time to return. This is understandable as the D: drive on my computer consumes around 60 GB of disk space and contains nearly 30,000 files and 4,000 folders. It does not seem to be horrible performance considering what it is actually doing.
The Get-ModifiedFiles.ps1 script can be changed to take advantage of the Windows PowerShell pipeline. The Param statement and the creation of the datetime object contained in the $dteModified variable are exactly the same. The first change comes when the results of the Get-ChildItem cmdlet are pipelined to the next command instead of being stored in the $files variable. This results in two performance improvements. The first is that the subsequent sections of the script are able to begin work almost immediately. When the results of the Get-ChildItem cmdlet are stored in a variable, this means that the entire of 30,000 files and 4,000 folders must be enumerated before any additional processing can begin. In addition, because the variable is stored in memory, it is conceivable that the computer could run out of memory before it completed enumerating all the files and folders from an extremely large drive. The change to the pipeline is seen here:
Get-ChildItem -path $path -recurse |
Instead of using the foreach statement, the Get-ModifiedFilesUsePipeline.ps1 script uses the ForEach-Object cmdlet. The ForEach-Object is designed to accept pipelined input and is more flexible than the foreach language statement. The default parameter for the ForEach-Object cmdlet is the process parameter. As each object comes through the pipeline, the $_ automatic variable is used to reference it. Here the $_ automatic variable is acting in a similar fashion as the $file variable from the Get-ModifiedFiles.ps1 script. The if statement is exactly the same in the Get-ModifiedFilesUsePipeline.ps1 script with the exception of the change to use $_ instead of $file. The ForEach-Object section of the Get-ModifiedFilesUsePipeline.ps1 script is seen here:
ForEach-Object {
if($_.LastWriteTime -ge $dteModified)
{ $changedFiles ++ }
}
The user message is the same as in the Get-ModifiedFiles.ps1 script. The completed Get-ModifiedFilesUsePipeline.ps1 script is seen here.
Get-ModifiedFilesUsePipeline.ps1
Param(
$path = “D:”,
$days = 30
) #end param
$dteModified= (Get-Date).AddDays(-$days)
Get-ChildItem -path $path -recurse |
ForEach-Object {
if($_.LastWriteTime -ge $dteModified)
{ $changedFiles ++ }
}
“The $path has $changedFiles modified files since $dteModified”
When the Get-ModifiedFilesUsePipeline.ps1 script is run, it seems a little faster, but it may be hard to tell. Was the modification to the script worth the trouble? To see if a change to a script a script makes an improvement in the performance of the script, you can use the Measure-Command cmdlet. You will want to first measure the performance of the original script, and then measure the performance of the revised script. To measure the performance of the original script, you supply the path to the Get-ModifiedFiles.ps1 script to the Expression parameter of the Measure-Command cmdlet. This is seen here:
PS C:fso> Measure-Command -Expression { C:fsoGet-ModifiedFiles.ps1 }
The Measure-Command cmdlet returns a System.TimeSpan .NET Framework class. The System.TimeSpan .NET Framework class is used to measure the difference between two System.DateTime classes. It has a number of properties that report days, hours, minutes, seconds, and milliseconds. These properties report the TimeSpan in units of these divisions. In the following image, you see that the Get-ModifiedFiles.ps1 script ran for 26 seconds and 141 milliseconds. The System.TimeSpan object also reports the TimeSpan in total units. This is the same TimeSpan reported as five different units. For example, the Get-ModifiedFiles run time of 26 seconds and 141 milliseconds translates into 26.1411386 total seconds or 0.435685643333333 total minutes. When expressed in milliseconds, this value is 26141.1386. This is seen here:
The double display of time breakdown into days, hours, minutes, seconds, and milliseconds can be confusing to people who are not used to working with the System.TimeSpan .NET Framework class. In general you can probably examine only the TotalSeconds property when testing your scripts.
Now it is time to see if the use of the pipeline makes any difference in the performance of the script. To measure the performance of the Get-ModifiedFilesUsePipeline.ps1 script, the path to the Get-ModifiedFilesUsePipeline.ps1 script is passed to the expression parameter of the Measure-Command cmdlet. This results in the command line seen here:
PS C:fso> Measure-Command -Expression { C:fsoGet-ModifiedFilesUsePipeline.ps1 }
Once the command has run, the TimeSpan object seen in the following image is displayed.
As seen in the previous image, the Get-ModifiedFilesUsePipeline.ps1 script completed in 8.0739805 total seconds. When compared to the original 26.1411386 total seconds, we see a nearly 70 percent improvement in the speed of the script, or expressed another way, the Get-ModifiedFilesUsePipeline.ps1 script is nearly 3.5 times faster than the original Get-ModifiedFiles.ps1 script. This is a significant performance improvement no matter how you express it.
There are further changes that can be made to the Get-ModifiedFilesUsePipeline.ps1 script. This is a more radical modification to the script because it requires removing the ForEach-Object cmdlet and the if statement. This is the section of code that is ripped out:
ForEach-Object {
if($_.LastWriteTime -ge $dteModified)
{ $changedFiles ++ }
}
By removing the ForEach-Object cmdlet and the if statement you can get rid of the $changedFiles ++ statement and take advantage of the fact that Windows PowerShell automatically returns objects from the cmdlets. The use of the single Where-Object cmdlet should be faster than the more convoluted ForEach-Object when combined with the if statement. But you will determine if the modification is effective when you test the script with the Measure-Object cmdlet. By using a single Where-Object cmdlet, you arrive at this filter:
where-object { $_.LastWriteTime -ge $dteModified }
The result of the pipeline operation is stored in the $changedFiles variable, which has a count property associated with it. Directly reading the count property should be faster than incrementing the $changedFiles variable as was done in the Get-ModifiedFilesUsePipeline.ps1 script. The entire Get-ModifiedFilesUsePipeline2.ps1 script is seen here.
Get-ModifiedFilesUsePipeline2.ps1
Param(
$path = “D:”,
$days = 30
) #end param
$changedFiles = $null
$dteModified= (Get-Date).AddDays(-$days)
$changedFiles = Get-ChildItem -path $path -recurse |
where-object { $_.LastWriteTime -ge $dteModified }
“The $path has $($changedFiles.count) modified files since $dteModified”
When the Get-ModifiedFilesUsePipeline2.ps1 script is run, the script completes in 8.4052029 seconds. This is a 9.6 percent decrease in the speed of the script. In this particular example, the modification to the script was not an improvement in the performance. The TimeSpan object that is created by running the Get-ModifiedFilesUsePipeline2.ps1 script is seen here:
LM, we come to the end of another Hey, Scripting Guy! article. We hope you will use take a look at the Measure-Command cmdlet to see if changes you make to a script help or hurt things. The key point we have seen today is to remember the pipeline. Simple changes can often make dramatic improvements in the performance of a script.
Join us tomorrow as we continue talking about testing scripts. If you want to keep up to date on what is happening on the Script Center, follow us on Twitter. If you use Facebook, consider joining the Scripting Guys group. In fact, there are two groups. The first is a normal group where we post information about upcoming articles on the Script Center. The other is a fan group. Join them both and show your support for the Scripting Guys. Don’t forget about the Official Scripting Guys Forum where you can hang out with other scripters. See you tomorrow, until then peace!
Ed Wilson and Craig Liebendorfer, Scripting Guys
0 comments