$OutputEncoding to the rescue

PowerShell Team

PowerShell Team

You might have noticed that “findstr” does not work properly with non-English text in PowerShell.

For example:

Let’s create a text file with some Chinese characters in it.

PS C:\> ${c:\test.txt}=”中文

Try to use findstr to find  one of the Chinese characters, and it did not find anything.

PS C:\> Get-Content test.txt | findstr /c:

The same command works in Cmd.exe.

PS C:\> cmd /c “findstr /c: test.txt”

中文

 

What went wrong? When we pipe output data from PowerShell cmdlets into native applications, the output encoding from PowerShell cmdlets is controlled by the $OutputEncoding variable, which is by default set to ASCII. We can fix the afore-mentioned scenario by changing $OutputEncoding to [Console]::OutputEncoding.

PS C:\> $OutputEncoding

 

 

IsSingleByte      : True

BodyName          : us-ascii

EncodingName      : US-ASCII

HeaderName        : us-ascii

WebName           : us-ascii

WindowsCodePage   : 1252

IsBrowserDisplay  : False

IsBrowserSave     : False

IsMailNewsDisplay : True

IsMailNewsSave    : True

EncoderFallback   : System.Text.EncoderReplacementFallback

DecoderFallback   : System.Text.DecoderReplacementFallback

IsReadOnly        : True

CodePage          : 20127

 

 

 

PS C:\> $OutputEncoding = [Console]::OutputEncoding

PS C:\> $OutputEncoding

 

 

BodyName          : gb2312

EncodingName      : 体中文(GB2312)

HeaderName        : gb2312

WebName           : gb2312

WindowsCodePage   : 936

IsBrowserDisplay  : True

IsBrowserSave     : True

IsMailNewsDisplay : True

IsMailNewsSave    : True

IsSingleByte      : False

EncoderFallback   : System.Text.InternalEncoderBestFitFallback

DecoderFallback   : System.Text.InternalDecoderBestFitFallback

IsReadOnly        : True

CodePage          : 936

 

 

 

PS C:\> Get-Content test.txt | findstr /c:

中文

 

Voila! Now findstr works!

 

Wei Wu [MSFT]

 

 

 

POSTSCRIPT:  The reason we convert to ASCII when piping to existing executables is that most commands today do not process UNICODE correctly.  Some do, most don’t. 

 

Jeffrey Snover

PowerShell Team
PowerShell Team

Follow PowerShell Team   

0 comments

Leave a comment