June 22nd, 2015

Format Specifiers Checking

Yuriy Solodkyy
Software Developer

By popular request, in Visual Studio 2015 RTM, we’ve implemented the checking of arguments given to printf/scanf and their variations in the C standard library. You can try the examples from this post in our online compiler.

Summary

Here is a list of all the formatting warnings that were introduced:

State

Level

Number

Text

On

W1

C4473

‘<function>’ : not enough arguments passed for format string

On

W3

C4474

‘<function>’ : too many arguments passed for format string

On

W3

C4475

‘<function>’ : length modifier ‘<length>’ cannot be used with type field character ‘<conversion-specifier>’ in format specifier

On

W3

C4476

‘<function>’ : unknown type field character ‘<conversion-specifier>’ in format specifier

On

W1

C4477

‘<function>’ : format string ‘<format-string>’ requires an argument of type ‘<type>’, but variadic argument <position> has type ‘<type>’

On

W1

C4478

‘<function>’ : positional and non-positional placeholders cannot be mixed in the same format string

OFF

W4

C4774

‘<function>’ : format string expected in argument <position> is not a string literal

On

W3

C4775

nonstandard extension used in format string ‘<format-string>’ of function ‘<function>’

On

W1

C4776

‘%<conversion-specifier>’ is not allowed in the format string of function ‘<function>’

OFF

W4

C4777

‘<function>’ : format string ‘<format-string>’ requires an argument of type ‘<type>’, but variadic argument <position> has type ‘<type>’

On

W3

C4778

‘<function>’ : unterminated format string ‘<format-string>’

Example 1

Consider the following snippet taken from real code:

wchar_t const * str = ;// Some string to parse

char buf[10];

wchar_t wbf;

swscanf_s(str, L“%10c %1C”, buf, sizeof(buf), &wbf);

 

Compiling it with cl.exe and no additional flags (default warning level is 1) will give you 3 warnings (colors are used for clarity of presentation):

warning C4477 : ‘swscanf_s’ : format string ‘%10c’ requires an argument of type ‘wchar_t *’, but variadic argument 1 has type ‘char *’
note: this argument is used by a
conversion specifier
note: consider using ‘%hc’ in the format string
note: consider using ‘%Tc’ in the format string
note: consider defining _CRT_STDIO_ISO_WIDE_SPECIFIERS if C99 standard semantics is required

The first warning indicates a mismatch between the type of the expected and the actual argument in the context of swscanf_s. Note that the same actual argument might be valid for that format specifier if you had called a different function (for example, sscanf_s), which is why we include the name of the function in these newly introduced warning messages. If the given conversion specifier would match the actual argument with different length modifiers, then we will list those combinations as suggestions. Note that following these suggestions may not always be the right thing to do, because the conversion specifier or the type of the argument may have to be changed. We will not suggest other conversion specifiers because changing the conversion specifier itself will usually lead to semantic changes, which requires insight into the logic of the program.

We chose to refer to the positions of arguments as relative to the beginning of the variadic arguments instead of relative to all the arguments. We wanted this numbering to be consistent with the numbering used by positional arguments in _p functions , and we found this scheme easier to work with because variadic arguments tend to immediately follow the format string (except in the case of _l functions).

warning C4477: ‘swscanf_s’ : format string ‘%1C’ requires an argument of type ‘char *’, but variadic argument 3 has type ‘wchar_t *’
note: this argument is used by a
conversion specifier
note: consider using ‘%lC’ in the format string
note: consider using ‘%llC’ in the format string
note: consider using ‘%LC’ in the format string
note: consider using ‘%wC’ in the format string

If you use /Za during compilation to disable Microsoft extensions, you will notice that notes suggesting the use of non-standard format specifiers (marked with gray color above) don’t appear, and that you get one additional warning:

warning C4775 : nonstandard extension used in format string ‘%1C’ of function ‘swscanf_s’
note: the combination of length modifier ” with type field character ‘C’ is non standard

If you compile for x64, where size_t is defined to be typedef unsigned __int64 size_t; you will also get:

warning C4477: ‘swscanf_s’ : format string ‘%10c’ requires an argument of type ‘int’, but variadic argument 2 has type ‘size_t’
note: this argument is used as a
buffer size

The problem here is that the buffer size is expected to be of type int, which has 4 bytes on x64, but the actual argument has type size_t, which occupies 8 bytes on x64. In some cases, it is possible to get this warning on x86 as well, but you will need to enable warning C4777, which is off by default. To do so, compile with /w14777 or /Wall. Unfortunately, C4777 will not help you in this specific case, because the value will have to be declared with type std::size_t for us to detect the potential issue, while here the type of the expression sizeof(buf) is of unnamed type, which is also the underlying type of size_t.

Some format specifiers can consume up to 3 arguments from the stack, depending on the function. Because of this, it might be confusing at first to encounter a warni ng that indicates that a format specifier %s expects an argument of type int. To reduce confusion, we added notes that indicate what such arguments are used for in a given context, with examples highlighted in red above and below. These notes explain whether the argument is used by the conversion specifier itself, as a required buffer size or as a width or precision field.

warning C4473 : ‘swscanf_s’ : not enough arguments passed for format string
note: placeholders and their parameters expect 4 variadic arguments, but 3 were provided
note: the missing variadic argument 4 is required by format string ‘%1C’
note: this argument is used as a
buffer size

Missing variadic arguments can be as big of a security concern as incorrect types, because they may cause your program to read garbage from the stack. To reflect the severity of these issues, we created C4473 as a level 1 warning.

Note that in earlier previews of Visual Studio 2015, C4473 was known as C4317, while C4474 used to be C4422 and C4776 was C4426. Please make sure to update your pragmas or build scripts if you were suppressing or disabling these warnings using their old numbers.

Example 2

Consider another simple example we found in “real world” code:

const char* path= “PATH=%WindowsSdkDir%bin\\%_ARCH% ;%PATH%”;

printf_s(path);

Compiling this with default flags wouldn’t result in any diagnostic messages, giving you a false sense of security, but you can clearly see that there are problems here. Unfortunately, always giving a warning when a format string is not a string literal turned out to generate too many warnings on valid use cases (e.g. localization), so as a compromise we decided to provide this warning as off by default. To enable it, you need to pass /w14774 or /Wall, after which you will get:

warning C4774 : ‘printf_s’ : format string expected in argument 1 is not a string literal
note: e.g. instead of printf(name); use printf(“%s”, name); because format specifiers in ‘name’ may pose a security issue
note: consider using
constexpr specifier for named string literals

We recommend enabling this warning at least occasionally to detect all the places where format checking is not actually happening due to non-literal format strings. The important message here is the second note, which suggests that you use constexpr instead of const. Doing so allows us to evaluate ‘path’ at compile time and thus perform format checking at the point of its use:

warning C4476 : ‘printf_s’ : unknown type field character ‘W’ in format specifier
warning C4476: ‘printf_s’ : unknown type field character ‘b’ in format specifier
warning C4476: ‘printf_s’ : unknown type field character ‘_’ in format specifier
warning C4476: ‘printf_s’ : unknown type field character ‘;’ in format specifier
warning C4476: ‘printf_s’ : unknown type field character ‘P’ in format specifier
warning C4778: ‘printf_s’ : unterminated format string ‘%’

The solution is to simply use %% instead of %, but, surprisingly, we’ve seen way too many occurrences of this bug in real code. The problem can be even more subtle once you realize that ‘ ‘ (space) is a valid printf flag (see the 4th warning about ‘;’ above)

Example 3

Consider another example:

struct HTMLElement

{

    const char* tag;

    virtual std::string content() const = 0;

};

 

int n;

HTMLElement* elem = ;

_tprintf_p(_T(“<%hhs>%hhs%n</%1$hhs>”), elem->tag, elem->content().c_str(), &n);

This snippet tries to print the content of an html element while at the same time counting how many characters we printed before printing the closing tag. The closing tag is printed via a positional format specifier (“%1$hhs”) that refers to the first argument (elem->tag). And since we are in a _t function, we try to ensure it treats “%s” as a narrow string by writing “hh”. What we get is the following:

warning C4475 : ‘_printf_p’ : length modifier ‘hh’ cannot be used with type field character ‘s’ in format specifier
warning C4776: ‘%n’ is not allowed in the format string of function ‘_printf_p’
warning C4478: ‘_printf_p’ : positional and non-positional placeholders cannot be mixed in the same format string

The first warning says that “hh” is not a valid length modifier for “%s”. The second warning tells you that “%n” is disallowed in this function. The last warning reminds you that you are not allowed to mix positional and non-positional arguments.

Example 4

When we tested these warnings, we noticed that a lot of developers where using the following code to print the value of a pointer:

const char* ptr = ; // Some pointer

printf(“%08X”, ptr);

We are not sure why this pattern was prevalent to using the standard %p format specifier, but it was so common that we felt the need to elaborate on it. Compiling this code as is on x86 would give you the following warning:

warning C4477: ‘printf’ : format string ‘%08X’ requires an argument of type ‘unsigned int’, but variadic argument 1 has type ‘const char *’

Compiling it on x64 additionally produces:

warning C4313: ‘printf’: ‘%X’ in format string conflicts with argument 1 of type ‘const char *’

C4313 is an existing warning that was designed to detect integer/pointer size mismatches. We can get rid of this warning on x86 by converting the pointer to an integral type of the same size as pointers:

printf(“%08X”, reinterpret_cast<intptr_t>(ptr));

This does not work on x64, because there intptr_t is 8 bytes, while unsigned int is 4.

warning C4477: ‘printf’ : format string ‘%08X’ requires an argument of type ‘unsigned int’, but variadic argument 1 has type ‘intptr_t’
note: consider using ‘%IX’ in the format string

Casting this way also results in printing a truncated value e.g. 9515CED0 instead of the 000000E29515CED0 produced by %p on x64.

To get a warning about such potential truncations on x86, you have to explicitly enable the off-by-default warning C4777 (e.g. by passing /w14777 on command line):

warning C4777 : ‘printf’ : format string ‘%08X’ requires an argument of type ‘unsigned int’, but variadic argument 1 has type ‘intptr_t’
note: the sizes of types ‘intptr_t’ and ‘unsigned int’ might differ on other platforms
note: consider using ‘%IX’ in the format string

The text of the warning C4777 is exactly the same as C4477, but it is given in noisier contexts where the expected and actual types are related on the target platform. For example, int vs. long or double vs. long double on many architectures targeted by Microsoft have the same set of values, while technically being different built-in types. We found in our testing that the number of such mismatches was very high compared to the number of more serious mismatches, with a ratio of about 10 to 1. So, we decided to distinguish the two cases and have the noisier case be off by default.

Following the note’s suggestion to use the “I” length modifier:

printf(“%08IX\n”, reinterpret_cast<intptr_t>(ptr));

produces the correct value “E29515CED0” for the above pointer, but this output is not prepended by zeros to reflect the greater number of bits that pointers on x64 have. To alleviate this, we must also pass the width of the field to print:

const size_t MACH_PTR_SIZE = sizeof(void*);

printf(“%0*IX\n”, 2*MACH_PTR_SIZE, reinterpret_cast<intptr_t>(ptr));

Surprisingly, this gives us another warning (the noisier C4777 on x86 and the stricter C4477 on x64):

warning C4777: ‘printf’ : format string ‘%0*IX’ requires an argument of type ‘int’, but variadic argument 1 has type ‘size_t’
note: this argument is used as a
field width
note: the sizes of types ‘size_t’ and ‘int’ might differ on other platforms

which indicates that the field width has to be of type int, not size_t. Making the following modification:

printf(“%0*IX == %p\n”, int(2*MACH_PTR_SIZE), reinterpret_cast<intptr_t>(ptr), ptr);

finally gets rid of all the warnings and prints the pointer in the same way as using %p (in our implementation) both on x86 and x64. If you prefer a standard length modifier, you can always go with the combination of %tX and ptrdiff_t instead of %IX and intptr_t.

What Next?

Currently, the checking of format specifiers is only done for a predefined set of CRT functions and is not available for user-defined functions that would also benefit from similar checks. If there is enough interest, we will consider extending these warnings to work on such user-defined functions. We would also like to hear about other bugs in the printf/scanf family of functions that you would like the compiler to detect. Feel free to email me (yuriysol from Microsoft) or comment below and provide any feedback you can think of. Thank you!

Author

Yuriy Solodkyy
Software Developer

Corporal World Dropout ... cause making bugs is way more fun than fixing them :)

1 comment

Discussion is closed. Login to edit/delete existing comments.

  • Jake Stine

    I liked this feature so much, as was distraught to discover it only works on MSVC CRT built-ins. So I made a horrible macro to allow our projects to apply it to our logger output and our std::string formatting helpers. It works by including a fully-skipped invocation of snprintf() and re-pasting all parameters into that snprintf() call. This does not result in duplicate parameter evaluation, thanks to the snprintf() being unreachable via 0&& clause:

    #if...

    Read more