You might have noticed that “findstr” does not work properly with non-English text in PowerShell.
For example:
Let’s create a text file with some Chinese characters in it.
PS C:\> ${c:\test.txt}=”中文“
Try to use findstr to find one of the Chinese characters, and it did not find anything.
PS C:\> Get-Content test.txt | findstr /c:中
The same command works in Cmd.exe.
PS C:\> cmd /c “findstr /c:中 test.txt”
中文
What went wrong? When we pipe output data from PowerShell cmdlets into native applications, the output encoding from PowerShell cmdlets is controlled by the $OutputEncoding variable, which is by default set to ASCII. We can fix the afore-mentioned scenario by changing $OutputEncoding to [Console]::OutputEncoding.
PS C:\> $OutputEncoding
IsSingleByte : True
BodyName : us-ascii
EncodingName : US-ASCII
HeaderName : us-ascii
WebName : us-ascii
WindowsCodePage : 1252
IsBrowserDisplay : False
IsBrowserSave : False
IsMailNewsDisplay : True
IsMailNewsSave : True
EncoderFallback : System.Text.EncoderReplacementFallback
DecoderFallback : System.Text.DecoderReplacementFallback
IsReadOnly : True
CodePage : 20127
PS C:\> $OutputEncoding = [Console]::OutputEncoding
PS C:\> $OutputEncoding
BodyName : gb2312
EncodingName : 简体中文(GB2312)
HeaderName : gb2312
WebName : gb2312
WindowsCodePage : 936
IsBrowserDisplay : True
IsBrowserSave : True
IsMailNewsDisplay : True
IsMailNewsSave : True
IsSingleByte : False
EncoderFallback : System.Text.InternalEncoderBestFitFallback
DecoderFallback : System.Text.InternalDecoderBestFitFallback
IsReadOnly : True
CodePage : 936
PS C:\> Get-Content test.txt | findstr /c:中
中文
Voila! Now findstr works!
Wei Wu [MSFT]
POSTSCRIPT: The reason we convert to ASCII when piping to existing executables is that most commands today do not process UNICODE correctly. Some do, most don’t.
Jeffrey Snover
0 comments