When would CopyFile succeed but produce a file filled with zeroes?

Raymond Chen


A customer reported that they very rarely encounter cases where they used the Copy­File function to copy a file from a network location to the local system, the function reports success, but when they go to look at the file, it’s filled with zeroes instead of actual file data. They were wondering what could cause this situation. They suspected that the computer may have rebooted and wanted to know whether file contents are flushed to disk under those conditions.

When the Copy­File function returns success, it does not mean that the data has reached physical media. It means that the data was written to the cache. The customer didn’t specify whether the reboot was a crash or an organized restart. If the system crashed, then any unwritten data in the cache is lost. On the other hand, if the reboots through the normal shutdown-and-reboot process, then the cache will be written out as part of the shutdown.

The customer wondered whether passing the COPY_FILE_NO_BUFFERING would cover the crash scenario.

A member of the file system team explained that the COPY_FILE_NO_BUFFERING flag tells the system not to keep data in memory, but rather send it straight to the device. This flag was recommended for large files (which don’t fit in RAM anyway), but it’s a bad idea for small files, since every file will have to wait for the device to respond before the system can move on to the next file. You’ll have to experiment to find the breakeven point for your specific data set and device.

Note, however, that the COPY_FILE_NO_BUFFERING doesn’t solve the problem.

For example, the system might crash while the file copy is still in progress. The Copy­File function creates a 4GB file (say), but manages to copy only 1GB of data into it before crashing. The other 3GB was never copied even though the file claims to be 4GB in size.

Another possibility is that all the file data makes it to the device, but the metadata does not get written to the device before the crash. The Copy­File returned success, but all of the bookkeeping didn’t make it to the device.

Even if you call Flush­File­Buffers, the system could crash before Flush­File­Buffers returns.

One possible way to address these problems is to copy the file to a temporary name, flush the file buffers, and then rename the file to its final name. The downside of this is that it forces synchronous writes to the device, which slows down your overall workflow, so it’s not a cheap algorithm to use.

But let’s step back. Is there a way to avoid slowing down the common case just because there’s a rare problem case?¹

A higher-level solution may be in order: The next time your program runs, you can detect that it did not shut down properly. When that happens, hash the file contents and compare it to the expected value. This moves the expensive operation to the rare case, allowing the file copy to proceed at normal speed.

¹ We’re assuming that system crashes are rare. If they’re not rare, then I think you have bigger problems.

Raymond Chen
Raymond Chen

Follow Raymond   

George Gonzalez 2019-05-17 08:50:13
Long ago, but after all the unreliable hard disks in the IBM At's, we were all getting a bit complacent about checking for errors after every printf(), and I wondered how important it was to check for file write errors.  Somewhere in MSDN I found a page that actually enumerated all the possible ways a write API could fail.  Memory is fuzzy but I think they listed about NINETEEN different and unusual ways to fail.   So I resolved to start the new year with checking after every IO operation.   
Dimitrios Kalemis 2019-05-17 21:20:22
You write: " On the other hand, if the reboots through the normal shutdown-and-reboot process, then the cache will be written out as part of the shutdown." I have a question about that. Is what you wrote always true? If during the normal shutdown process (for some reason, like the hardware being too slow at the sector level access) the cache takes too long to be written out, then Windows will kill the cache-writting-out-process (in order for the shutdown to continue). If such a thing happens, then even the normal shutdown process is not a guarantee. What is your opinion?
cheong00 2019-05-19 18:58:26
Actually the "cure" is to use externally powered storage, so even when there's unexpected shutdown, the cached data on storage device would still be written to disk. If you have unstable power, then you probably want to know that UPS is a thing.
Martin Mitáš 2019-06-05 03:57:55
I fully agree system crashes can be (from app POV) in most cases treated as events that simply never occur. It isn't responsibility of any user space app to guarantee stability of the system as a whole. But writing some data to external/unplugguble device or a remote filesystem living somewhere on a network are not in that category. So the question how the app can know whether it successfully got the data to the destination is still relevant even when we take system crashes out of the equation. Is there a reasonable way how to reliably detect when such a problem happens (1) without knowing any details about the target filesystem and (2) without any slowdown due the synchronized output?