December 11th, 2006

$OutputEncoding to the rescue

PowerShell Team
PowerShell Team

You might have noticed that “findstr” does not work properly with non-English text in PowerShell.

For example:

Let’s create a text file with some Chinese characters in it.

PS C:\> ${c:\test.txt}=”中文

Try to use findstr to find  one of the Chinese characters, and it did not find anything.

PS C:\> Get-Content test.txt | findstr /c:

The same command works in Cmd.exe.

PS C:\> cmd /c “findstr /c: test.txt”

中文

 

What went wrong? When we pipe output data from PowerShell cmdlets into native applications, the output encoding from PowerShell cmdlets is controlled by the $OutputEncoding variable, which is by default set to ASCII. We can fix the afore-mentioned scenario by changing $OutputEncoding to [Console]::OutputEncoding.

PS C:\> $OutputEncoding

 

 

IsSingleByte      : True

BodyName          : us-ascii

EncodingName      : US-ASCII

HeaderName        : us-ascii

WebName           : us-ascii

WindowsCodePage   : 1252

IsBrowserDisplay  : False

IsBrowserSave     : False

IsMailNewsDisplay : True

IsMailNewsSave    : True

EncoderFallback   : System.Text.EncoderReplacementFallback

DecoderFallback   : System.Text.DecoderReplacementFallback

IsReadOnly        : True

CodePage          : 20127

 

 

 

PS C:\> $OutputEncoding = [Console]::OutputEncoding

PS C:\> $OutputEncoding

 

 

BodyName          : gb2312

EncodingName      : 体中文(GB2312)

HeaderName        : gb2312

WebName           : gb2312

WindowsCodePage   : 936

IsBrowserDisplay  : True

IsBrowserSave     : True

IsMailNewsDisplay : True

IsMailNewsSave    : True

IsSingleByte      : False

EncoderFallback   : System.Text.InternalEncoderBestFitFallback

DecoderFallback   : System.Text.InternalDecoderBestFitFallback

IsReadOnly        : True

CodePage          : 936

 

 

 

PS C:\> Get-Content test.txt | findstr /c:

中文

 

Voila! Now findstr works!

 

Wei Wu [MSFT]

 

 

 

POSTSCRIPT:  The reason we convert to ASCII when piping to existing executables is that most commands today do not process UNICODE correctly.  Some do, most don’t. 

 

Jeffrey Snover

Category
PowerShell

Author

PowerShell Team
PowerShell Team

PowerShell is a task-based command-line shell and scripting language built on .NET. PowerShell helps system administrators and power-users rapidly automate tasks that manage operating systems (Linux, macOS, and Windows) and processes.

0 comments

Discussion are closed.