When would CopyFile succeed but produce a file filled with zeroes?

Raymond Chen

Raymond

A customer reported that they very rarely encounter cases where they used the Copy­File function to copy a file from a network location to the local system, the function reports success, but when they go to look at the file, it’s filled with zeroes instead of actual file data. They were wondering what could cause this situation. They suspected that the computer may have rebooted and wanted to know whether file contents are flushed to disk under those conditions.

When the Copy­File function returns success, it does not mean that the data has reached physical media. It means that the data was written to the cache. The customer didn’t specify whether the reboot was a crash or an organized restart. If the system crashed, then any unwritten data in the cache is lost. On the other hand, if the reboots through the normal shutdown-and-reboot process, then the cache will be written out as part of the shutdown.

The customer wondered whether passing the COPY_FILE_NO_BUFFERING would cover the crash scenario.

A member of the file system team explained that the COPY_FILE_NO_BUFFERING flag tells the system not to keep data in memory, but rather send it straight to the device. This flag was recommended for large files (which don’t fit in RAM anyway), but it’s a bad idea for small files, since every file will have to wait for the device to respond before the system can move on to the next file. You’ll have to experiment to find the breakeven point for your specific data set and device.

Note, however, that the COPY_FILE_NO_BUFFERING doesn’t solve the problem.

For example, the system might crash while the file copy is still in progress. The Copy­File function creates a 4GB file (say), but manages to copy only 1GB of data into it before crashing. The other 3GB was never copied even though the file claims to be 4GB in size.

Another possibility is that all the file data makes it to the device, but the metadata does not get written to the device before the crash. The Copy­File returned success, but all of the bookkeeping didn’t make it to the device.

Even if you call Flush­File­Buffers, the system could crash before Flush­File­Buffers returns.

One possible way to address these problems is to copy the file to a temporary name, flush the file buffers, and then rename the file to its final name. The downside of this is that it forces synchronous writes to the device, which slows down your overall workflow, so it’s not a cheap algorithm to use.

But let’s step back. Is there a way to avoid slowing down the common case just because there’s a rare problem case?¹

A higher-level solution may be in order: The next time your program runs, you can detect that it did not shut down properly. When that happens, hash the file contents and compare it to the expected value. This moves the expensive operation to the rare case, allowing the file copy to proceed at normal speed.

¹ We’re assuming that system crashes are rare. If they’re not rare, then I think you have bigger problems.

Raymond Chen
Raymond Chen

Follow Raymond   

2 Comments
Avatar
George Gonzalez 2019-05-17 08:50:13
Long ago, but after all the unreliable hard disks in the IBM At's, we were all getting a bit complacent about checking for errors after every printf(), and I wondered how important it was to check for file write errors.  Somewhere in MSDN I found a page that actually enumerated all the possible ways a write API could fail.  Memory is fuzzy but I think they listed about NINETEEN different and unusual ways to fail.   So I resolved to start the new year with checking after every IO operation.   
Avatar
Dimitrios Kalemis 2019-05-17 21:20:22
You write: " On the other hand, if the reboots through the normal shutdown-and-reboot process, then the cache will be written out as part of the shutdown." I have a question about that. Is what you wrote always true? If during the normal shutdown process (for some reason, like the hardware being too slow at the sector level access) the cache takes too long to be written out, then Windows will kill the cache-writting-out-process (in order for the shutdown to continue). If such a thing happens, then even the normal shutdown process is not a guarantee. What is your opinion?