<format> in Visual Studio 2019 version 16.10

Avatar

Charlie

C++20 adds a new text formatting facility to the standard library, designed primarily to replace snprintf and friends with a fast and type safe interface. The standardized library is based on the existing {fmt} library, so users of that library will feel at home.

Before diving into how std::format works I want to thank Victor Zverovich, Elnar Dakeshov, Casey Carter, and miscco, all of whom made substantial contributions to this feature, and were the reason why we could complete it so quickly.

Overview

To start using <format> you just need to be using Visual Studio 2019 version 16.10 or later and you need to be compiling with /std:c++latest. You can get the latest Visual Studio preview here.

The simplest and most common way to use <format> is to call:

template<class... Args>
string format(string_view fmt, const Args&... args);

fmt is the format-string and args are the things you’d like to format. The format string consists of some text interspersed with curly brace delimited replacement fields. For example: "Format arguments: {} {}!" is a format string for formatting two arguments. Each replacement field corresponds to the next argument passed. So std::format("Format arguments {} {}!", 2, 1) would produce the string "Format arguments 2 1!"

Format strings can also contain numbered replacement fields, for example "Format arguments {1} {0}!". These refer to the numbered argument passed in, starting from zero. Numbered and un-numbered (automatic) replacement fields can not be mixed in the same format string.

There are all sorts of modifiers you can use to change the way a particular parameter is formatted. These are called “format specifiers” and are specified in the replacement field like so: std::format("{:<specifiers>}", <arg>). Let’s look at an example that has one of everything.

std::format("{:🐱^+#12.4La}", 4.f);

This returns the string “🐱+1.0000p+2🐱” (printing this string out to the console on Windows can be a bit difficult). Let’s go through what each component of the above string told std::format to do. First we have “🐱^” the “fill and align” part of the format specifiers, saying we’d like our output center aligned and padded with cat emojis. Next we have “+”, meaning we’d like a sign character no matter what (the default is “-” to only print the “-” sign for negatives, and you can also use a space to ask for a minus sign or a space). After that we specify “#”, meaning “alternate form”. For floats the alternate form causes format to always insert a decimal point. Next we specify “12.4” to get a width of 12 and a precision of 4. That means format will use the “fill” and “alignment” settings to make sure our output is at least 12 characters wide and the float itself will be printed to 4 digits of precision. Next the “L” specifier causes format to use locale specific formatting to print things like decimal separators. Finally “a” causes the output to be in hexfloat format. More detailed information about the possible format specifications can be found at cppreference.

For width and precision specifiers you may reference a format argument instead of using a literal value like so:

std::format("{0:{1}.{2}}", 4.2f, 4, 5);

This results in a width of 4 and a precision of 5. The rules for mixing automatic and manual indexing (don’t do it) still apply, but you can use automatic indexing to reference width and precision as in:

std::format("{:{}.{}}", 4.2f, 4, 5);

The assignment of automatic indices is performed left to right, so the above two examples are equivalent.

Performance

In general std::format performance should be in the same ballpark as fmt::format and snprintf if you compile your code with the /utf-8  . If you don’t use the /utf-8 option then performance can be significantly degraded because we need to retrieve your system locale to correctly parse the format string. While we’re working to improve performance for this case in a future release we recommend you use /utf-8 for the best experience.

Unicode

std::format doesn’t do any transcoding between different text encodings, however it is aware of the “execution character set” and uses it to interpret the format string. The versions of std::format taking a wide (wchar_t) format string are always interpreted as UTF-16. The versions of std::format taking a narrow (char) format string interpret the format string as UTF-8 if we detect the /utf-8 (or /execution-charset:utf-8) option. Otherwise we interpret the format string as being encoded in the active system codepage. This means that if you compile your code with a non-UTF-8 execution charset it may not run correctly on systems with a different system codepage setting. There’s also a significant performance cost to figuring out the system codepage, so for best performance we recommend you compile with /utf-8. We’re working to improve the performance of format in non-UTF execution character sets in future releases.

Unicode also comes into play when dealing with width and precision specification for strings. When we interpret the format string as UTF-8 or UTF-16 we compute the “estimated width” of a string taking into account a rough estimate of the size of each code-point. If we’re interpreting the format string as a non-Unicode encoding we just estimate the width as the number of code units (not code points) in the string. In a future release we’ll add grapheme clusterization to the width computations for Unicode encodings.

Locales

While we always parse the format string according to the rules above, the locale used for things like decimal separator positions can be customized. By default no locale is used. If you use the L specifier then some locale specific formatting may be used. By default it’s the current global locale as returned by a default constructed std::locale, however each formatting function has a version allowing you to pass in your own std::locale object to override that behavior.

Future work

Over the next few Visual Studio releases we’ll be improving the performance of std::format, and fixing bugs. Additionally C++23 will likely add compile time format checking to format literals and we may implement that before 2023 (for code you want to work great in C++23 don’t rely on catching std::format_errors from invalid format strings!). C++23 will also make a small change to the definitions of std::vformat_to and std::format_to that reduces code size but can be observable, for forward compatibility make sure any custom formatters work with all output iterators. More information on these changes can be found in p2216r3. C++23 may also bring additional functionality like std::print and better ways to handle Unicode text.

Differences from {fmt} (not exhaustive)

For those familiar with {fmt}, a quick list of differences from the standardized version of the library:

  • Named arguments are not supported.
  • None of the miscellaneous formatting functions like fmt::print or fmt::printf are supported.
  • Format strings are not checked at compile time
  • There is no support for automatically formatting types with an std::ostream& operator<<(std::ostream&, const T&) overload
  • The behavior of some format specifiers is slightly different (for example the default alignment for void*, and allowing sign specifiers for unsigned types)

Give us feedback

Try out format in your own code, and file any bugs on our out GitHub issue tracker.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Posted in C++

26 comments

Leave a comment

  • Avatar
    Андрей Попандопуло

    Setting just compiler option /utf-8 along doesn’t help. Source file as well as include files referenced by it must be saved in utf-8 encoding if they contain other then ANSI symbols. And it’s very inconvenient as far as VS C/C++ text editor doesn’t save source files in UTF-8 encoding by default as some other VS editors do. Looks like using UTF-8 all over the Windows (as in Linux) is the best relief from this headache.

    • Avatar
      Charlie BartoMicrosoft employee

      well, the `/utf-8` option sets both the source and execution character sets, so if your files are actually in another encoding then you get mojibake. You can actually just set the execution charset (and not the source charset) to utf-8 and get the speed improvement in format. Unfortunately using UTF-8 “everywhere” isn’t supported in windows, the utf-8 locale is still experimental and a lot of text passes through the system as wchar_t arrays.

  • Avatar
    Igor Shigaev

    I cry! 🙂 I cry laughing how this feature is done! Is it really NEW??? It’s what people waste time on? Look how it’s done in C#. “Look and despair!” (c) Shelly 🙂

    var s = $"Hello, {user}, you owe {credit:0.###} money!";

    Note on how elegant you do formatting too (credit).

    • Avatar
      Me Gusta

      It is always scary when you see this kind of comment. It shows a major lack of understanding on quite a few counts.
      But remember, there is always a trade off between simplicity/elegance and capabilities.

    • Avatar
      Charlie BartoMicrosoft employee

      It’s new in C++. It’s also a pure library facility in c++, and it’s hard to replicate what c# does as a library without macros of some kind (if the `$` operator is a macro it has to be slightly unhygienic in order to actually look for the “user” and “credit” symbols in the calling context).

      I expect c++ will get “interpolation” style format strings once we have reflection and code generation, maybe in c++26 or something.

      On the bright side making it possible to do interpolation style macros as a pure library facility has some real upsides, for example SQL libraries can have their own that put the identifiers right into parameters, avoiding any kind of injection.

  • Avatar
    Vash The Stampede

    One of the missing elements are examples. The implementation for custom data implementation differs from that of the fmtlib.

    I hope this serves as an example .. if there is a better solution, please enlighten.

    #include  <format>
    #include <source_location>
    
    template <> struct std::formatter<std::source_location> : std::formatter<std::string>
    {
    	template<typename FC>  auto format(const std::source_location& sl, FC& ctx)
    	{
    		auto s {std::format("file:{} line:{} column:{} function:{}", sl.file_name(), sl.line(), sl.column(), sl.function_name())};
    		return std::formatter::format(s, ctx);
    	}
    };
    
      • Avatar
        Vash The Stampede

        Looking forward to it.
        Another issue is whether the library shall include formatters for such elements as std::atomic_ulong, std::atomic_int and std::atomic where T is_integral.
        I find that I’ve got to duplicate the same code and the formatter does not accept that std::atomic_ulong is the same as std::atomic where T=unsigned long and I’ve got to express both in each library in my codebase.

      • Alex Syniakov
        Alex Syniakov

        nope. just simple specialization in the header:

        template<>
        struct std::formatter<glm::dvec3> : std::formatter<std::string> {
            auto format(glm::dvec3 t, std::format_context& ctx) {
                auto s{ std::format("{} {} {}", t[0], t[1], t[2]) };
                return std::formatter::format<std::string>(s, ctx);
            }
        };
        • Avatar
          Charlie BartoMicrosoft employee

          (sorry about the code formatting, the blog platform eats templates!)

          Does just writing that custom formatter slow down compile times more than pulling in format and glm and then formatting with std::format directly (as in the formatter’s format method).

          That _does_ pull in quite a bit of code, but the specialization itself shouldn’t be much worse than just using format by itself. glm may _also_ pull in quite a lot of code. It may be possible to reduce the amount of code pulled in, but ultimately any TU that needs to format glm vectors probably wants to use them as well.

          On the STL side the largest headers format uses are string and locale. We could avoid pulling in locale in a future version by separately compiling the internal “backend” format functions, but we didn’t want to do that in the initial release because it can present migration challenges.

          It is technically possible to separately compile the formatter implementation, but it’s undefined behavior, and it will break in the future if you do that. It also is unlikely to actually help much in this case.

  • Avatar
    Dwayne Robinson

    >we need to retrieve your system locale
    Do you really though? 😢 Code pages were already a deprecated legacy concept 14 years ago when I started working on DirectWrite in favor of Unicode. Can we finally ditch the headache and make `std::format` the line in the sand where we *default* to Unicode, and only *if* you need something custom, then opt into using code pages?

    p.s. It’s awesome that format is in – been looking forward to it for a while.

    • Avatar
      Charlie BartoMicrosoft employee

      the problem is that lots of users still compile code using other execution charsets (gb18030 and shift-jis (well actually cp932)). Additionally most build systems don’t add `/utf-8` by default, meaning that only supporting utf-8 would break a _ton_ of people.

      Maybe we should try and emit a warning if you use format with a non-utf execution charset.

    • Avatar
      Me Gusta

      Essentially, different character types. In C++20, UTF-8 characters are now stored in char8_t character type, u8string_view is std::basic_string_view<char8_t>. So really, for this to be correct it would have to be:

      std::u8string format(std::u8string_view fmt, ...);

      Anyway, the big reason why it doesn’t exist is that it wasn’t defined in P0645R10. Proper Unicode support in C++ is pretty weak in general.

      • Avatar
        Charlie BartoMicrosoft employee

        Another reason is that std::format actually doesn’t require the data formatted to be in any encoding at all. A u8string_view overload would be nice to tell format that the format string is utf-8, even if your execution character set isn’t.

        There’s also the question of if that overload should return std::u8string or std::string.

        There’s a general feeling that u8string isn’t really fit for usage as it stands now.

        • Avatar
          Alois Kraus

          I was just a few days again using C++ and thought it would be good to get more speed than from C# by using the latest and greatest C++ features by using std::format. After profiling I found these myriads of GetLocale calls which makes me cringe because I do not change my locale with every millisecond. If a new feature in C++ is added it sould be fast by default. Correctness is of course always a concern but I do not think that this should be causing loops in the current locale setting. Windows is also guilty of this. Did you ever wonder why parsing the Windows EventLog is so slow? https://aloiskraus.wordpress.com/2020/07/20/ms-performance-hud-analyze-eventlog-reading-performance-in-realtime/ There it is also getting the current locale on every format call. If correctness is of concern I would add some locale change callback notification mechanism which allows a safe atomic change without going through hops in the registry or other OS calls.

          • Avatar
            Charlie BartoMicrosoft employee

            pass /utf-8 to the compiler. If you do that we no longer need to get your locale. In the future, we may use some new compiler frontend features to improve the speed of things on other common encodings (in particular 1252 and gb18030, although 18030 will always be a bit slower because of the need to go character by character to find control characters).

  • Avatar
    Christian Riesch

    Are there any plans to implement compile-time checking of format strings?

    fmt deliberately does not warn when there are more arguments than format specifiers, see here. I would like to see such a warning from the STL, since I’ve been bitten by this in the past.
    Example:

    auto s = std::format("a = {}, b = []", a, b); // typo, missing braces
    • Avatar
      Charlie BartoMicrosoft employee

      there are plans, although the C++20 version of format does not have compile time checking it’s being added for c++23, and we may implement it in 20 mode too.

      I’m not sure it will warn for that case, since that case doesn’t actually throw an error today. It’s a good idea though.