November 7th, 2024

Why do I observe reads from a memory-mapped file when writing large blocks?

A customer had created a memory-mapped file, and wanted to set large chunks of the memory to zero. For concreteness, consider this function that takes a memory-mapped file mapping (map) and a collection of blocks, each represented as a file offset and length. Its job is to set each block to zero.

void ZeroOutBlocks(uint8_t* map,
    std::initializer_list<
        std::pair<uint32_t, uint32_t>> blocks)
{
    for (auto&& block : blocks) {
        memset(map + block.first, 0, block.second);
    }
}

The customer found that even though this function is performing only write operations, a performance trace showed that the system was nevertheless reading from the file.

Memory-mapped files work by trapping the first access to each page. When the access occurs, the entire page is read from the disk, mapped into memory, and then the memory access operation is permitted to proceed.

The system does this regardless of whether the access was a read or a write. After all, that one written byte has to be merged with the existing content of the page.

“But in my case, the offsets and lengths are all page multiples, so there is no need to read the entire page into memory.”

Well, you know that, but the CPU doesn’t.

The CPU sees the first write to the page and traps to the kernel. The operating system doesn’t go to the trouble of analyzing the code surrounding the fault and realizing that this code sequence:

@@: vmovntdq ymmword ptr [rcx],ymm0
    vmovntdq ymmword ptr [rcx+20h],ymm0
    vmovntdq ymmword ptr [rcx+40h],ymm0
    vmovntdq ymmword ptr [rcx+60h],ymm0
    vmovntdq ymmword ptr [rcx+80h],ymm0
    vmovntdq ymmword ptr [rcx+0A0h],ymm0
    vmovntdq ymmword ptr [rcx+0C0h],ymm0
    vmovntdq ymmword ptr [rcx+0E0h],ymm0
    add     rcx,100h
    sub     r8,100h
    cmp     r8,100h
    jae     @B

is a memset loop that writes r8 ÷ 256 × 8 copies of the ymm0 register to consecutive bytes of memory.

But suppose the operating system had code to recognize the top 50 most common implementations of memset. And suppose that it saw that the memset was going to write an entire page of zeroes. In that case, could it avoid the useless read?

I guess the operating system could do that. It would have to realize that this was a full-page memset and perform the same memset on the newly-mapped page before making the memory visible to the process (in case other threads read from the page before the faulting thread finishes the loop).

Still, that’s a lot of memset detection, since it would have to run at every write page fault. I suspect the write page faults that are due to memset of more than one page represent too small a fraction of total write page faults to be worth the trouble.

But who knows. My intuition on this has been wrong before.

Update: Commenter Nir Lichtman pointed out that memory-mapped files and I/O are not necessarily coherent, so the Write­File trick doesn’t work.

What you can do instead is write zeroes to the memory-mapped file the old-fashioned way: with Write­File. Since Windows NT unifies memory-mapped files with the file cache, these writes will be coherent with the memory mapping. If you want to get fancy, you can use overlapped writes and wait for them all to complete.

You need only set aside one page of zeroes for this trick. For regions up to a page in size, you can issue a Write­File from your special zero-buffer. Regions larger than a page can be broken up into page-sized chunks, or you can use Write­File­Gather to write that page to consecutive pages in the file.

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

6 comments

  • Nir Lichtman 2 weeks ago · Edited

    Interesting! Small question about the usage of WriteFile in this context, the manual page about CreateFileMapping states that "A mapped file and a file that is accessed by using the input and output (I/O) functions (ReadFile and WriteFile) are not necessarily coherent." but in the article you state that "Windows NT unifies memory-mapped files with the file cache, these writes will be coherent with the memory mapping" which is in contradiction, is the current information...

    Read more
    • Alex Cohn 1 week ago

      But it is probably OK to stop mapping the file, fill the necessary blocks with zeros with WriteFile(), and map the file again. The question is, how much overhead such sequence adds, and how this compares to the time spent on useless reads?

    • Neil Rashbrook 2 weeks ago

      “not necessarily” probably means that there may be cases where you can get it to work for now but they might not work later so you shouldn’t try.

      • Nir Lichtman 1 week ago

        Thanks Raymond. BTW, my dad, Moshe Lichtman, Windows 95 Plug&Play, says hi.

  • Neil Rashbrook 2 weeks ago · Edited

    What would happen if you tried to use FSCTL_SET_ZERO_DATA to free those blocks of the file?

    Edit: It won’t necessarily work, so you shouldn’t try it.