C# 11 Preview Updates – Raw string literals, UTF-8 and more!

Kathleen Dollard

Features for C# 11 are coming along nicely! You can check these features out by downloading Visual Studio 17.2 Preview 3 or .NET 7 Preview 3 for other editors. You can find more about C# 11 features that appeared earlier in What’s new in C# 11 and Early peek at C# 11 features and you can follow the progress of C# 11 on the Feature Status page. You can find out about other .NET 7 Preview 3 features in this .NET Blog post and more about Visual Studio 17.2 in the release notes.

We evolve C# to improve your development productivity, the resiliency of your applications in production, performance and support for new features. The C# team works on both the performance of your application in production, and how the compiler performance affects your development. Features in this post include:

This post also explains why we removed parameter null-checking from C# 11 and are adding a warning for lowercase type names.

Raw string literals

If you work with strings literal that contain quotes or embedded language strings like JSON, XML, HTML, SQL, Regex and others, raw literal strings may be your favorite feature of C# 11. Previously if you copied a literal string with quotes into a C# literal, the string ended at the first double quote with compiler errors until you escaped each one. Similarly, if you copied text with curly braces into an interpolated string literal, each curly bracket was interpreted as the beginning of a nested code expression unless you escape it, generally by doubling the curly bracket.

Raw string literals have no escaping. For example, a backslash is output as a backslash, and \t is output as the backslash and a t, not as the tab character.

Raw string literals start and end with at least three double quotes ("""..."""). Within these double quotes, single " are considered content and included in the string. Any number of double quotes less than the number that opened the raw string literal are treated as content. So, in the common case of three double quotes opening the raw string literals, two double quotes appearing together would just be content. If you need to output a sequence of three or more double quotes, just open and close the raw string literal with at least one more quote than that sequence.

Raw string literals can be interpolated by preceding them with a $. The number of $ that prefixes the string is the number of curly brackets that are required to indicate a nested code expression. This means that a $ behaves like the existing string interpolation – a single set of curly brackets indicate nested code. If a raw string literal is prefixed with $$, a single curly bracket is treated as content and it takes two curly brackets to indicate nested code. Just like with quotes, you can add more $ to allow more curly brackets to be treated as content. For example:

JSON example of raw string literal

Raw string literals also have new behavior around automatically determining indentation of the content based on leading whitespace. To learn more about this and to see more examples on this feature, check out the docs article Raw String Literals.

This feature will make it much easier to work with literals that contain certain characters. You can now copy code into or from a literal string without being hindered by adding or removing escape sequences.

Special thanks to jnm2 (Joseph Musser) for his work on the design and implementation of raw string literals.

UTF-8 String Literals

UTF-8 is used in many scenarios, particularly in web scenarios. Prior to C# 11, programmers had to either translate UTF-8 into hexadecimal – which led to verbose, unreadable, error prone code – or encode string literals at runtime.

C# 11 allows converting string literals containing only UTF-8 characters to their byte representation. This is done at compile-time, so the bytes are ready to use without additional runtime cost. So you can write code like:

byte[] array = "hello";             // new byte[] { 0x68, 0x65, 0x6c, 0x6c, 0x6f }
Span<byte> span = "dog";            // new byte[] { 0x64, 0x6f, 0x67 }
ReadOnlySpan<byte> span = "cat";    // new byte[] { 0x63, 0x61, 0x74 }

There are ongoing discussions about details such as whether a type suffix is required and what natural type that would imply. If you expect to use UTF-8 string literals, we would really like your feedback and you can see the UTF-8 String Literal proposal and the links contained in it for more information.

This feature brings a welcome simplification to everyone currently building byte arrays to represent UTF-8. If you are doing this, you will probably want to convert your code to use it after C# 11 releases. If you are not using UTF-8 string literals you can ignore this feature. For ASP.NET users, your response is encoding to UTF-8 from strings automatically, so you can ignore this feature.

Checked user-defined operators

One of the major motivations for the static abstract members in interfaces feature of C# 11 is the ability to support generic math. .NET developers can write algorithms that rely on interfaces that include static abstract members as the generic constraint. One such interface is INumber<TSelf> which provides access to APIs such as Max, Min, Parse, and even operators such as +, -, *, and /, as well as user defined conversions.

User-defined operators respect the arithmetic overflow and underflow checking context of the calling code, controlled via the <CheckForOverflowUnderflow> project property or the checked/unchecked regions and operators. Check out the language reference for about checked and unchecked behavior for arithmetic operators. Prior to C# 11, a user-defined operator was unaware of the context in which it was used.

C# 11 adds the ability to declare certain operators as checked, identified with the checked modifier. Operators that do not have this modifier will be unchecked when paired with a checked operator. The compiler will select the right operator to use based on the context of the calling code. The operators that can support checked versions are the ++, -- and - unary operators and the +, -, *, and / binary operators.

The distinction between checked and unchecked is the context in which they are used. There is no requirement that checked operators throw if the bounds of the type are exceeded or that unchecked operators not throw, but this is the behavior users expect. For example, for integer types MAX_VALUE+1 is MIN_VALUE in the unchecked context and throws an exception in the checked context. Some types, such as floating point numbers, do not overflow and therefore do not need separate checked and unchecked operators.

This feature is important to developers creating user-defined operators that operate on types where arithmetic overflow is a valid concept. It will allow new user-defined operators to respect the context in which the operator is used. We anticipate that only a small number of developers will use this feature directly, but the impact of their implementations will make the entire ecosystem more reliable and predictable.

Auto-default structs

Note: This feature is planned for 17.3, not 17.2. It was mistakenly included in this post. i am not removing it to avoid confusion about our intention regarding this feature. Look for it in a future preview!

In C# 10 and earlier, you needed to initialize all fields of a struct by initializing fields and auto-properties or setting them in the constructors. This can be awkward, particularly with the expected introduction of the field keyword and semi-auto properties in a later C# 11 preview. If you did not set these values, you received a compiler error. If we have sufficient information to provide the error, perhaps we should just set these values to default for you!

Starting with this preview, the compiler does exactly that. It initializes any fields and auto-properties that are not set based on definite assignment rules, and assigns the default value to them. If you do not want this behavior, there is a warning you can turn on.

This feature simplifies initialization for anyone using structs that include explicit constructors. This is likely to feel like the way structs with initializers always should have worked, and so you may take advantage of this feature without even thinking about it. If you are explicitly initializing fields to their default value in response to the previous compiler errors, you can remove that code.

Pattern matching with spans

Starting with this preview, you can pattern match a Span<char> or a ReadonlySpan<char> with a string literal. This code now works:

static bool IsABC(Span<char> s)
{
    return s switch { 
        "ABC" => true, 
        _ => false };
}

The input type must be statically known to be a Span<char> or a ReadonlySpan<char>. Also, the compiler reports an error if you match a Span<char> or a ReadonlySpan<char> to a null constant.

This feature will allow Span<char> or ReadonlySpan<char> to participate as patterns in switch statements and switch expressions for matching string literals. If you are not using Span<char> and ReadonlySpan<char> you can ignore this feature.

Special thanks to YairHalberstadt for implementing this feature.

Use a cached delegate for method group conversion

This feature will improve runtime performance by caching static method groups, rather than creating fresh delegate instances. This is to improve your application’s performance, particularly for ASP.NET. You will get the benefit of this feature with no effort on your part.

Special thanks to pawchen for implementing this feature

Remove parameter null-checking from C# 11

We previewed parameter null-checking as early as possible because we anticipated feedback. This feature allows !! on the end of a parameter name to provide parameter null checking before the method begins execution. We included this feature early in C# 11 to maximize feedback, which we gathered from GitHub comments, MVPs, social media, a conference audience, individual conversations with users, and the C# design team’s ongoing reflection. We received a wide range of feedback on this feature, and we appreciate all of it.

The feedback and the wide range of insight we gained from this feedback led us to reconsider this as a C# 11 feature. We do not have sufficient confidence that this is the right feature design for C# and are removing it from C# 11. We may return to this area again at a later date.

While there are several valid ways you can do null check on a single line, if you are using .NET 6 we recommend using ArgumentNullException.ThrowIfNull method:

public static void M(string myString)
{
    ArgumentNullException.ThrowIfNull(myString);
    // method 
}

One of the benefits of using the ThrowIfNull method is it uses CallerArgumentExpression to include the parameter name in the exception message automatically:

System.ArgumentNullException: 'Value cannot be null. (Parameter 'myString')'

Warning wave: Warnings on lowercase type names

C# 11 introduces a Warning Wave 7 that includes a warning for any type that is declared with all lowercase ASCII characters. This has been a common stylistic rule in the C# ecosystem for years. We are making it a warning because C# needs to occasionally introduce new keywords in order to evolve. These keywords will be lowercase and may conflict with your type’s name, if it is lowercase. We introduced this warning so you can avoid a possible future breaking change.

You can find out more about this change at Warning on lowercase type names in C# 11. Warning waves allow new warnings in C# in a manner that allows you to delay adoption if the warning causes issues you cannot currently resolve.

This warning is expected to affect very few people. But if you encounter it, we recommend updating your type name, or prefixing usages of it with @, such as @lower.

Closing

Please download Visual Studio 17.2 Preview 3 or .NET 7 Preview 3, try out the new features, and tell us what you think in the Discussions section of the CSharpLang repo.

41 comments

Leave a comment

  • JesperTreetop

    “and t is output as the backslash and a t, not as the tab character.”
    Something seems to be devouring backslashes again. This is getting problematic for articles where backslashes carry so much information.

    Raw string literals are looking very promising, UTF-8 literals look useful and cached method groups delegates close a hole many people didn’t know was there. I agree with holding !! null checking back, it’s too much of a sui generis feature.

    • Thomas Levesque

      They don’t want to put the ! on the type, for reasons explained in the link. Also, this isn’t a very good example, since char isn’t nullable in the first place 😉

      • Fabiao Milando

        char wasn’t a good example. I actually meant string.

        What I’ve seen, and this is what they don’t want, is this (I agree, it’s not nice):

        void Method(string text!!)

        Instead my suggestion is this:

        void Method(string! text)

        This would make sense to me and it’s nice and clean.

        string? text = null; //nullable reference type
        string! text = "sample data"; //non-nullable reference type

        Let’s wait and see how they will solve it.

  • John King

    UTF-8 String Literals

    I think it’s not a good thing for P/invoke users , after all , C++’s char* is ascii , I think it’s good to have a compiler time string to byte convert, but it’s better to have a prefix to inducate the encoding, for example :
    C#
    span<byte> x = u8"nice utf8 byte" ;
    span<byte> x = u16"nice utf-16 byte" ;
    span<byte> y = a"nice ascii byte" ;// default should be ascii

    • Joey .

      Typically C++ does not have any defined encoding for char*, but I guess by far the most prevalent usage these days (on Linux and macOS) is that they are UTF-8 by virtue of the default system encoding being UTF-8. So yes, technically it will be wrong for some applications, but so is UTF-16 or ASCII (and both arguably in more situations).

      But perhaps being explicit would be a good idea (although I guess adding even more string literal prefixes may make it confusing over time when you get to things like u16$@"foo", I’m not sure that approach scales well with the possible string literal modifiers C# already has) by having to specify an encoding:

      Span<byte> utf8Span = "UTF-8".AsSpan(Encoding.UTF8);
      

      and so on.

    • Danstur

      “after all , C++’s char* is ascii”
      That’s not true. C++ as usual doesn’t care and even for win32 the encoding is usually the OEM encoding (yeah despite the A suffix) and in other situations ANSI.
      But if you’re interacting with a win32 library you really, really should be using utf16 anyhow so this doesn’t matter.

      Where it does matter is for high performance or cross platform libraries where UTF8 is the obvious choice of encoding.

    • Andrew Witte

      Totally agree with this.
      Although the default should just be a “string”.
      Also for C# it should probably be more like:

      byte[] utf8Value = u8"Hello こんにちは";
      byte[] utf16Value  = u16"Hello こんにちは";
      byte[] asciValue  = a"Hello こんにちは";// compiler error if value contains unsupported characters
      string defaultValue = "Hello こんにちは";
  • Paulo Pinto

    Thankfully there was some sense regarding parameter null-checking.

    Non-Nullable References might lack adoption, specially since many of us are stuck in .NET Framework land, but turning C# into !! everywhere isn’t really the option, as proven by horrid Kotlin and Swift code plagued with such constructs.

    As for the rest, looking forward to the remaining C# 11 features, thanks for the update.

  • Amichai Mantinband

    Raw string literals is such an awesome feature, thanks for adding it! great to see once again that the community can influence the direction of the language. Seems like a good place to plug the library I released which uses CallerArgumentExperssion & some other tricks I hope will be useful to some: https://github.com/mantinband/throw

    • JesperTreetop

      This would still bifurcate string types into string and Utf8String – better than Span<byte> which could theoretically be random garbage or a JPEG or whatever the user sent in over the network, but still bifurcated. Making string have UTF-8 innards behind a configuration switch would still bifurcate things but in a much more insidious, hard-to-detect way (or it would need to convert to UTF-16 on-the-fly if string keeps its current public API and contract, including transparently performing a lot of allocations, which is pretty brutal for something that’s used all over the place). At least if you have two different types, it is documented to be different.

      • Kathleen DollardMicrosoft employee

        Today, people are using byte, and it is a mess. There is no UTF-8 string type on the horizon, so we are proposing moving ahead with a conversion. Details are still in process, but will be in this week’s LDM notes.

        • Simon Felix

          That’s great! This stop-gap obviously only helps with some of the pain points, but it’s a start.

          But solving the memory overhead requires a different approach. One proposed different solution (UTF-8 encoded, byte-backed strings) would solve the discussed problem AND the memory overhead. I fear that this stop-gap solution will be the permanent solution and I’ll have to deal with Span<byte> instead of string forever.

      • Simon Felix

        The discussion in the linked issue landed on the second proposal: Instead of creating a separate Utf8String type, the backing buffer should be UTF-8 encoded bytes. For a regular .NET dev, the internals of string are invisible. The pain points are: Native interop/P-Invoke (very relevant for Win32 UI apps), and no longer constant-time char access.

        Let’s talk about char indexing first: Indexing UTF-16 codepoints in a byte-backed string will no longer be constant time operation. However indexing chars is error-prone in some scripts in general because of the codepoints vs. chars issue, and should probably be discouraged anyways. We should change the spec to return UTF-8 codepoints instead (big no-no, I know), to make it more obvious that it’ll only work correctly for latin-1 characters.

        The interop problem is more difficult to solve. But I’d be happy to pay that performance & allocation tax in many of my applications (Console apps and Web APIs). I suspect it would be a net win to allocate multiple times for every native interop call, but to halve the multi-gigabyte heap size in data science scenarios.

  • Ed Andersen

    Hi Kathleen

    Can you confirm that Auto-default Structs are in VS 17.2 Preview 3 as the article describes? I think they are in VS 17.3 Preview 1 (as the roadmap indicates they have been merged into) but this isn’t publicly available yet?

    Thanks.

    • Kathleen DollardMicrosoft employee

      Well shoot. Your imagination about how many versions I need to keep straight is about right, and I blew it on this.

      I added a note to the post. I fear if I remove that paragraph folks might wonder whether we pulled the feature.

      Thanks for finding this!!

  • Dia de Tedio

    So, this is not directly mentioned as a feature, abstract interface members are in this release?

    And, a question, why not to propose interfaces with requirement-only side of contracts instead of this?
    So, to be more clear, what I mean with interfaces with requirement-only side of contracts are interfaces that can be defined but need not to be implemented by types, instead, the compiler will just ensure that the specified type matches the implementation and generate apropriate code when calling methods and using properties for static and instance. This way C# can remove boxing for interfaces too on types, so, if abstract interfaces was designed to principally implement math operators this make far much more sense as it will be used principally with value types. The result of an implementation like that will be like:

    public requirements interface IAdd
    {
      static IAdd operator + (IAdd left, IAdd right);
    } 
    ...
    public class ConsumeIAdd
    {
      public T SumThreeValues(T a, T b, T c) where T : IAdd
     {
        return a + b + c;
     }
    }

    The code above should work simply without needing to make a type inherit from IAdd, just like a type constraint that will be applied.

    This solves the problem of needing to update all types with math interfaces in order to use math operators in methods and open the possibility to remove boxing wich can be a problem in some scenarious. This idea can solve some of Linq problems too if it is implemented in certain ways, like making an IReqEnumerable that will be used instead IEnumerable and thus removing the requirement to create boxes and garbage when iterating over collections.

  • Matthew Grochowalski

    I would rather see u8 as a prefix (rather than a suffix like the linked proposal has) to match C++ (and hopefully C in the future).

      • Matthew Grochowalski

        C++ has a 30+ year history of type suffixes yet for some reason decided to use a prefix anyway. Not that I know if that reason was a good one or not :-). Certainly prevents it from conflicting with user defined literals.

        Have all the type suffixes in C# been for numeric literals so far? The only string modifiers (although not type modifiers) I know of are prefixes: @ and $.

    • Karl von Laudermann

      Another reason to use u8 as a suffix (besides existing C# consistency) is so that it doesn’t clash with a @ or $ prefix for verbatim and interpolated strings.