Keep your eye on the code page: C# edition (warning about DllImport)

Raymond Chen

Raymond

Often, we receive problem reports from customers who failed to
keep their eye on the code page.

Does the SH­Get­File­Info function
support files with non-ASCII characters in their names?
We find that the function either fails outright or
returns question marks when asked to provide information
for files with non-ASCII characters in their name.

using System;
using System.Runtime.InteropServices;
class Program
{
 static void Main(string[] args)
 {
  string fileName = "BgṍRồ.txt";
  Console.WriteLine("File exists? {0}", System.IO.File.Exists(fileName));
  // assumes extensions are hidden
  string expected = "BgṍRồ";
  Test(fileName, SHGFI_DISPLAYNAME, expected);
  Test(fileName, SHGFI_DISPLAYNAME | SHGFI_USEFILEATTRIBUTES, expected);
 }
 static void Test(string fileName, uint flags, string expected)
 {
  var actual = GetNameViaSHGFI(fileName, flags);
  Console.WriteLine("{0} == {1} ? {2}", actual, expected, actual == expected);
 }
 static string GetNameViaSHGFI(string fileName, uint flags)
 {
  SHFILEINFO sfi = new SHFILEINFO();
  if (SHGetFileInfo(fileName, 0, ref sfi, Marshal.SizeOf(sfi),
                    flags) != IntPtr.Zero) {
   return sfi.szDisplayName;
  } else {
   return null;
  }
 }
 [StructLayout(LayoutKind.Sequential)]
 struct SHFILEINFO {
  public IntPtr hIcon;
  public int iIcon;
  public uint dwAttributes;
  [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
  public string szDisplayName;
  [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 80)]
  public string szTypeName;
 }
 const uint SHGFI_USEFILEATTRIBUTES = 0x10;
 const uint SHGFI_DISPLAYNAME = 0x0200;
 [DllImport("shell32.dll")]
 static extern IntPtr SHGetFileInfo(
    string path, uint fileAttributes, ref SHFILEINFO info, int cbSize,
    uint flags);
}
// Output:
// File exists? True
//  == Bg?R? ? False
// Bg?R? == Bg?R? ? False

If we ask for the display name, the function fails
even though the file does exist.
If we also pass the
SHGFI_USE­FILE­ATTRIBUTES flag
to force the system to act as if the file existed,
then it returns the file name but with question marks
where the non-ASCII characters should be.

The SH­Get­File­Info function
supports non-ASCII characters just fine,
provided you call the version that supports non-ASCII characters!

The customer here fell into the trap of not keeping their eye on the code
page.
It goes back to an unfortunate choice of defaults in the
System.Runtime.Interop­Services namespace:
At the time the CLR was originally being developed,
Windows operating systems derived from Windows 95 were still
in common use,
so the CLR folks decided to default to
Char­Set.Ansi.
This made sense back in the day, since it meant that your program
ran the same on Windows 98 as it did in Windows NT.
In the passage of time, the Windows 95 series
of operating systems became obsolete,
so the need to be compatible with it gradually disappeared.
But too late.
The rules were already set, and the default of
Char­Set.Ansi could not be changed.

The solution is to specify
Char­Set.Unicode explicitly in the
Struct­Layout
and
Dll­Import attributes.

FxCop catches this error, flagging it as
Specify­Marshaling­For­PInvoke­String­Arguments.
The error explanation talks about the security risks of
unmapped characters,
which is all well and good,
but it is looking too much at the specific issue and not
so much at the big picture.
As a result, people may ignore the issue because it is flagged
as a complicated security issue, and they will think,
“Eh, this is just my unit test, I’m not concerned about security here.”
However, the big picture is

This is almost certainly an oversight on your part.
You didn’t really mean to disable Unicode support here.

Change the lines

 [StructLayout(LayoutKind.Sequential)]
 [DllImport("shell32.dll")]

to

 [StructLayout(LayoutKind.Sequential, CharSet=CharSet.Unicode)]
 [DllImport("shell32.dll", CharSet=CharSet.Unicode)]

and re-run the program. This time, it prints

File exists? True
Bg?R? == Bg?R? ? True
Bg?R? == Bg?R? ? True

Note that you have to do the string comparison in the program
because the console itself has a troubled history with Unicode.
At this point, I will simply cue a

Michael Kaplan

rant and link to

an article explaining how to ask nicely
.

Raymond Chen
Raymond Chen

Follow Raymond   

0 comments

Comments are closed.