Using the PowerShell Tokenizer to Find Commands in Text

ScriptingGuy1

Summary: Microsoft Scripting Guy, Ed Wilson, shows how to use the Windows PowerShell tokenizer to parse text and identify commands.

Microsoft Scripting Guy, Ed Wilson, here. There are still five days for you to submit your entries for the 2011 Scripting Games. Events 1 through 5 are in the bag, but you can still upload scripts for events 6 through 10. Refer to the Events list for the exact deadlines on each of those…and good luck.

This is part two of a four part series of blogs that I am writing about adding the ability to change aliases in a script into the full cmdlet name. The first blog appeared yesterday, and it examined creating a hash table to store all of the current aliases and their associated definitions. You should refer to that blog for details about the creation and operation of that data structure.

Last year I wrote a blog about playing around with the Windows PowerShell tokenizer. You should refer to that blog for additional information about using the Windows PowerShell tokenizer to parse scripts because the tokenizer is not documented very well. On MSDN, you can find the PSParser .NET Framework class, but the information is essentially the same information you would obtain from the Get-Member cmdlet; and therefore, it is not especially eloquent.

The first thing I need to do is modify the last line of the code that I used to gather all the aliases and place them and the definitions into a hash table. The reason is that it displays the content in the end portion of the code, and I do not need the extra clutter on my screen. I left the end portion empty and commented out the remaining code here so it would be obvious where I made the change. In reality, I would remove the end section because I am not using it. This change is shown here.

-end {}#$a}

One of the things that is a bit strange about the tokenizer is that the errors get passed by reference. Therefore, the next thing I do is initialize a variable named $errors and set it equal to $null. This will prevent obtaining an error such as the one shown in the following image.

Image of command output

The $b variable contains a line of Windows PowerShell code that consists of two Windows PowerShell aliases. The first alias, gps, is an alias for the Get-Process cmdlet. The second alias, fl is an alias for the Format-List cmdlet. This is the code that will be “fixed” by removing the aliases and replacing them with the complete cmdlet names.

The key to the Remove-AliasFromCommand.ps1 script is the use of the tokenizer to return all of the commands from the line of code. The tokenizer will parse out commands, operators, command arguments, and other parts that make up a Windows PowerShell script. The output from parsing a simple command is shown in the following image.

Image of command output

As you can see in the previous image, the type property identifies the type of Windows PowerShell code that is in the content property. By using the Where-Object parameter to filter out only the type of command, I return a listing of all the commands in the script. This portion of the script is shown here.

[system.management.automation.psparser]::Tokenize($b,[ref]$errors) |

Where-Object { $_.type -eq “command” } |

All that was the hard part; the commands are passed down the pipeline to the Foreach-Object cmdlet, where each command is checked against the hash table of aliases. If the command matches one of the aliases, it is replaced with the alias description. As shown earlier, the command is contained in the content property from the tokenizer. In yesterday’s blog, I talked about addressing individual items in the hash table. The if portion of the command therefore is shown here.

if($a.($_.content))

I use the replace operator to replace the alias with the content. This Scripting Wife article is an excellent primer to working with the replace operator. The last thing I do in this script is display the contents of the $b variable. This portion of the script is shown here.

{ $b = $b -replace $_.content, $a.($_.content) } }

   $b

The complete script is shown here.

Remove-AliasFromCommand.ps1

Get-Alias |

 Select-Object name, definition |

 Foreach-object -begin {$a = @{} } `

                -process { $a.add($_.name,$_.definition)} `

                -end {}#$a}

 

$errors = $null

$b = ‘gps | fl *’

 

[system.management.automation.psparser]::Tokenize($b,[ref]$errors) |

Where-Object { $_.type -eq “command” } |

ForEach-Object {

   if($a.($_.content)) { $b = $b -replace $_.content, $a.($_.content) } }

   $b

The output from the previous script is shown in the following image.

Image of command output

I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

Ed Wilson, Microsoft Scripting Guy

0 comments

Discussion is closed.

Feedback usabilla icon