September 26th, 2023

C11 Threads in Visual Studio 2022 version 17.8 Preview 2

Charlie Barto
Software Engineer 2

Back in Visual Studio 2022 version 17.5 Microsoft Visual C gained preliminary support for C11 atomics. We are happy to announce that support for the other major concurrency feature of C11, threads, is available in Visual Studio version 17.8 Preview 2. This should make it easier to port cross-platform C applications to Windows, without having to drag along a threading compatibility layer.

Unlike C11 atomics, C11 threads do not share an ABI with C++’s <thread> facilities, but C++ programs can include the C11 threads header and call the functions just like any C program. Both are implemented in terms of the primitives provided by Windows, so their usage can be mixed in the same program and on the same thread. The implementations are distinct, however, for example you can’t use the C11 mutexes with C++ condition variables.

C11 contains support for threads and a variety of related concurrency primitives including mutexes, condition variables, and thread specific storage. All of these are implemented in Visual Studio version 17.8 Preview 2.

Threads

Threads are created with thrd_create, to which you pass a pointer to the desired entry point and a user data pointer (which may be null), along with a pointer to a thrd_t structure to fill in. Once you have a thrd_t created with thrd_create you can call functions to compare it to another thrd_t, join it, or detach it. Functions are also provided to sleep or yield the current thread.

int thread_entry(void* data) {
    return 0;
}

int main(void) {
    thrd_t thread;
    int result = thrd_create(&thread, thread_entry, NULL);
    if(result != thrd_success) {
        // handle error
    }
    result = thrd_join(thread, NULL);
    if(result != thrd_success) {
        // handle error
    }
    return 0;
}

 

A key difference between our implementation and C11 threads implementations based on pthreads is that threads can not detach themselves using thrd_current() and thrd_detach(). This is because of a fundamental difference in how threads work on Windows vs Unix descendants and we would require a shared datastructure that tracks thread handles to implement the typical behavior.

On Unix derivatives the integer thread ID is the handle to the thread and detaching just sets a flag causing the thread to be cleaned up immediately when it finishes. This makes detached threads somewhat dangerous to use on Unix derivatives, since after a detached thread exits any other references to that thread ID will be dangling and could later refer to a different thread altogether. On Windows the handle to a thread is a win32 HANDLE and is reference counted. The thread is cleaned up when the last handle is closed. There is no way to close all handles to a thread except by keeping track of them and closing each one.

We could implement the Unix/pthreads behavior by keeping a shared mapping of thread-id to handle, populated by thrd_create. If you need this functionality then you can implement something like this yourself, but we don’t provide it by default because it would incur a cost even if it’s not used. Better workarounds may also be available, such as passing a pointer to the thrd_t populated by thrd_create via the user data pointer to the created thread.

Mutexes

Mutexes are provided through the mtx_t structure and associated functions. Mutexes can be either plain, recursive, timed, or a combination of these properties. All kinds of mutexes are manipulated with the same functions (the type is dynamic).

#include <threads.h> 
 
static mtx_t mtx; // see below 
 
int main(void) { 
    if(mtx_lock(&mtx) != thrd_success) { 
        return 1; 
    } 
    // do some stuff protected by the mutex 
 
    // no need to check the result of a valid unlock call 
    mtx_unlock(&mtx); 
 
    // no need to call mtx_destroy 
} 

 

Our mutexes are always implemented on top of Slim Reader Writer Locks and are 32 bytes each on x64 (our C++ std::mutex is 80 bytes). They consist of an 8 byte tag (this is much more than needed, but provides some room for future expansion), an SRWLock, a win32 CONDITION_VARIABLE, and a 32-bit owner and lock count. The owner and lock count are always maintained, even when mutex is not recursive. If you attempt to recursively lock a non-recursive mutex, or unlock a mutex you do not own then abort() is called. Structurally valid calls to mtx_unlock always succeed, and it is safe to ignore the return value of mtx_unlock in our implementation.

In our implementation you need not call mtx_init; a zeroed mtx_t is a valid plain mutex. Mutexes also don’t require any cleanup and calls to mtx_destroy are optional. This means you can safely use mutexes as static variables and similar.

Condition Variables

Condition variables are provided through the cnd_t structure and associated functions. This structure is 8 bytes and stores just a win32 CONDITION_VARIABLE. You can wait on the condition variable with cnd_wait, or cnd_timedwait, and you can wake one waiting thread with cnd_signal or all waiting threads with cnd_broadcast. Spurious wakeups are allowed.

#include <threads.h> 
 
static mtx_t mtx; 
static cnd_t cnd; 
static int condition; 
 
int main(void) { 
    if(mtx_lock(&mtx) != thrd_success) { 
        return 1; 
    } 
    while(condition == 0) { 
        if(cnd_wait(&cnd, &mtx) != thrd_success) { 
            return 1; 
        } 
    } 
    mtx_unlock(&mtx); 
    return 0; 
} 

Similarly to mutexes, zeroed condition variables are valid and you can omit calls to cnd_init and cnd_destroy.

Thread Specific Storage

Thread specific storage is provided via the _Thread_local (thread_local in C23) keyword, or via the tss_ family of functions. _Thread_local works just like __declspec(thread) (see docs) and the tss_ functions work similarly, but not identically, to the Fls* or Tls* family of functions.

#include <threads.h> 
#include <stdlib.h> 
void dtor(void* dat) { 
    // not called in this program 
    abort(); 
} 
 
static tss_t t; 
 
int main(void) { 
    if(tss_create(&t, dtor) != thrd_success) { 
        return 1; 
    } 
    if(tss_set(t, (void*)42) != thrd_success) { 
        return 1; 
    } 
    if(tss_get(t) != (void*)42) { 
        return 1; 
    } 
    return 0; 
} 

The C11 TSS facilities support destructors which are run when threads exit and are passed the value of the associated TSS key, if it is non-null. The macro TSS_DTOR_ITERATIONS specifies how many times we’ll check for more destructors to run in the case that a destructor calls tss_set. Currently it’s set to 1, however, if this is a problem for you let us know. Destructors are run from either DllMain, or from a TLS callback (if you use the static runtime), and are not run on process teardown. This is an important difference from FLS destructors which are run on process teardown and get run before any DllMain routines or TLS callbacks.

TSS limits and performance characteristics

When using the explicit tss_ functions there is a limit of 1024 TSS indices per process, these are not the same indices used for the Fls* functions, the Tls* functions, or _Thread_local “implicit” TLS variables. If you use any <threads.h> functions (not just the TSS functions) and you use the static runtime then you will use at least one implicit TLS index (the ones used for _Thread_local), even if you don’t otherwise use implicit TLS. This is because we need to enable TLS callbacks, which causes the loader to allocate such an index. If this is a problem (for example because of the loader gymnastics that are required to dynamically load such modules) let us know, or just use the dynamic runtime. If you use the tss_ functions then additionally you will use one dynamic TLS index (the same ones used by TlsAlloc), you will only use one, no matter how many tss_ts you create. Threads will only spend time processing TSS destructors at thread exit if a TSS index with an associated destructor was ever set on that thread. When you create the first tss_t a table of destructors is allocated and when you use tss_set for the first time on a particular thread a per-thread table is allocated. Memory usage scales with the number of threads that use the C11 TSS functionality, not the total number of threads in the process. The destructor table is 8KiB (4KiB on 32-bit platforms) and each per thread table is 8209 bytes (4105 bytes on 32-bit platforms). These performance and memory characteristics may change in the future.

New Runtime Components

Because <threads.h> is a new feature and we want the implementation to be able to change and improve over time, it’s shipped as a new satellite DLL of vcruntime: vcruntime140_threads.dll and vcruntime140_threadsd.dll. If you use the dynamic version of the Visual C++ runtime (/MD or /MDd), and you use the new threads facilities, then you need to either redistribute this file with your app, or redistribute a Visual C++ runtime redist that is new enough to contain these files. If you don’t touch the C11 threads functionality then your app won’t depend on anything in this DLL and it will not be loaded at all.

Send us your feedback!

Try out C11 threads in the latest Visual Studio preview and share your thoughts with us in the comments below, on Developer Community, on twitter (@VisualC) or via email at visualcpp@microsoft.com.

 

Author

Charlie Barto
Software Engineer 2

5 comments

Discussion is closed. Login to edit/delete existing comments.

  • Costantino GRANA

    Unfortunately I have to confirm what was reported by Jeremie St-Amand. There is no file in the whole Visual Studio or Windows SDK folder structure. The DLLs are there, but no headers.

  • Falco Girgis

    I TOTALLY missed this and have been eagerly awaiting this announcement. I’ve been using the C11 atomics since day 1.

    I’m a developer that totally dropped MSVC for years due to lack of C support and have only started using it again within the last year. I’ve been extremely happy with how much support has been added. I truly hope the new C23 stuff is also on the agenda!

  • Jeremie St-Amand

    Is the header part of a preview Windows SDK? I’m on VS 17.8.0 Preview 2.0 and Windows 11 SDK 10.0.22621.0 but can’t seem to find the header.

  • Erin Catto

    I’ve been using C11 atomics for the next version of Box2D. They work well and I’m pleased Microsoft is adding more support for C concurrency. I’m also hoping to see C23 language features become available soon.

    When do you expect `experimental:c11atomics` to no longer be required?

    • Charlie BartoMicrosoft employee Author

      At the very least experimental:c11atomics will be required until we have fully implemented the locking atomics. We’ll stop defining __STDC_NO_ATOMICS__ at that time as well. I know that needing the flag and us still defining __STDC_NO_ATOMICS__ makes things harder to deal with for apps that don’t ever use locking atomics, but we didn’t want this standard feature testing macro to mean something special and different on MSVC.