Why am I seeing two WRITE requests at the same offset from a single call to WriteFile?

Raymond Chen

A customer was doing a little performance analysis and found an oddity: A single non-extending write request at the application layer was turning into two write requests at the I/O layer, both at the same offset:

Op	File	Offset	Length	Flags	Priority	Status
IRP_MJ_WRITE	test.txt	69,632	61,440	Non-cached, Write Through	Normal	SUCCESS
IRP_MJ_WRITE	test.txt	69,632	61,440	Non-cached, Write Through Paging, Synchronous Paging	Normal	SUCCESS

Friend-of-the-blog Malcolm Smith observed that the first write is non-cached. One possibility is that the first write is a flush of previously-dirty data due to a cached write or a writable memory-mapped view. The system then follows up with the second write, which is triggered by the application-level write.

However, if nobody else is writing to the file at the time the test is being run, then that scenario is ruled out.

Another possibility is that the file is compressed. In that case, the application-level write goes into the system cache, and then is flushed. This looks like two write operations from the file system’s point of view, which is what the log is watching. But really, only one write is issued to the physical drive.

The customer confirmed that they are writing to a compressed file.

Malcolm explained that NTFS compression is rather expensive.

The idea behind NTFS compression is that the file is broken up into 64KB chunks, with each chunk compressed separately,¹ and each chunk is managed independently.

This means that a simple write operation that isn’t a full chunk explodes into a sequence of operations:

Read the enclosing chunk
Decompress the enclosing chunk
Update the uncompressed chunk to incorporate the newly-written data
Compress the modified chunk
Find space on the disk for the modified chunk
Write the modified chunk to disk
Release the space that the old chunk occupied

One consequence of this is that compressed files are pathologically fragmented. The location of each chunk is unlikely to be correlated with the location of any other chunk in the file, especially after a bunch of updating write operations have occurred. Every compressed chunk winds up stored in a random location on the disk.

Furthermore, all this activity entails a lot of updates to the NTFS metadata, which is not just additional work, but it creates additional synchronization bottlenecks. In particular, a write to a compressed file cannot overlap with another write or read to that file, since the write has the metadata lock. For a non-compressed file, non-extending writes can coexist with reads and other non-extending writes, since none of these operations update file location metadata. They’re just writing to the sectors that hold the data.

NTFS compression can be used to reduce disk space requirements, but it is not well-suited to data that is constantly being modified. And if you’re studying performance issues, compressed files are going to show up as a bottleneck.

The customer thanked Malcolm for his assistance, and noted that they were doing their performance analysis on their development system, not a production system, and that explains the unexpected presence of file compression.

Bonus reading: The Alpha AXP, epilogue: A correction about file system compression on the Alpha AXP.

¹ Or at least, you hope that the chunk can be compressed. If you’re unlucky, the chunks won’t compress, and you went to all this extra effort and got nothing for it.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

3 comments

Discussion is closed. Login to edit/delete existing comments.

Jan Ringoš October 27, 2022

NTFS compression is also partially broken since at least Server 2022 (build 20384). Copy uncompressible (mkv, 7z) file into directory with `compressed` attribute, and it will eat twice its size of disk space, without this being at least reported in the “Size on disk” row in Properties dialog. Yes yes, it’s been added to Feedback Hub, and I’ve also spoken about it with an kernel engineer.
- Marius Mitea October 28, 2022
  
  It's normal for compressed data to compress in an even larger amount of data. You'll always have the compression header overhead, even if it's as trivial as specifying an "uncompressed chunk" flag. If there's no way to specify uncompressed data, then you'll need the entire compression header, Huffman tables, everything required to run the decompression algorithm.
  
  That's the irony of filesystem compression, unless you design data for it (like for PS5, games store mostly uncompressed textures because they know the filesystem is always compressed and they have a ridiculously fast SSD and SSD controller), chances are you're not saving much if...
  Read more
  It’s normal for compressed data to compress in an even larger amount of data. You’ll always have the compression header overhead, even if it’s as trivial as specifying an “uncompressed chunk” flag. If there’s no way to specify uncompressed data, then you’ll need the entire compression header, Huffman tables, everything required to run the decompression algorithm.
  
  That’s the irony of filesystem compression, unless you design data for it (like for PS5, games store mostly uncompressed textures because they know the filesystem is always compressed and they have a ridiculously fast SSD and SSD controller), chances are you’re not saving much if at all for an average computer user. Most data stored on disk is compressed in some form, the major exception being executables — and they make up a minority of files on your drives.
  
  Read less
  - Jan Ringoš October 28, 2022
    
    It’s normal to detect this pathological case, and just store the original data along with the header, flagged as such. All compression formats do this.
    
    NTFS did it too, since Windows 2000 until about Windows 10 1607 (LTSC 2016).

Why am I seeing two WRITE requests at the same offset from a single call to WriteFile?

Author

3 comments

Read next

Microspeak: Breaking into jail

Dubious security vulnerability: Reading the files in the WindowsApps folder

Author

3 comments

Read next

Microspeak: Breaking into jail

Dubious security vulnerability: Reading the files in the WindowsApps folder

Stay informed