A customer was using ReadÂDirectoryÂChangesW in the hopes of receiving a notification when a file was copied. They found that when a file was copied, they received a FILE_, but only once an hour. And they also got that notification even for operations unrelated to file copying.
Recall that ReadÂDirectoryÂChangesW and FindÂFirstÂChangeÂNotification are for detecting changes to information that would appear in a directory listing. Your program can perform a FindÂFirstÂFile/FindÂNextÂFile to cache a directory listing, and then use ReadÂDirectoryÂChangesW or FindÂFirstÂChangeÂNotification to be notified that the directory listing has changed, and you have to invalidate your cache.
But there are a lot of operations that don’t affect a directory listing.
For example, a program could open a file in the directory with last access time updates suppressed. (Or the volume might have last access time updates suppressed globally.) There is no change to the directory listing, so no event is signaled.
Functions like ReadÂDirectoryÂChangesW and FindÂFirstÂChangeÂNotification functions operate at the file system level, so the fundamental operations they see are things like “read” and “write”. They don’t know why somebody is reading or writing. All they know is that it’s happening.
If you are a video rental store, you can see that somebody rented a documentary about pigs. But you don’t know why they rented that movie. Maybe they’re doing a school report. Maybe they’re trying to make illegal copies of pig movies. Or maybe they simply like pigs.
If you are the file system, you see that somebody opened a file for reading and read the entire contents. Maybe they are loading the file into Notepad so they can edit it. Or maybe they are copying the file. You don’t know. Related: If you let people read a file, then they can copy it.
In theory, you could check, when a file is closed, whether all the write operations collectively combine to form file contents that match a collective set of read operations from another file. Or you could hash the file to see if it matches the hash of any other file.¹ But these extra steps would get expensive very quickly.
Indeed, we found during user research that a common way for users to copy files is to load them into an application, and then use Save As to save a copy somewhere else. In many cases, this “copy” is not byte-for-byte identical to the original, although it is functionally identical. (For example, it might have a different value for Total editing time.) Therefore, detecting copying by comparing file hashes is not always successful.²
If your goal is to detect files being “copied” (however you choose to define it), you’ll have to operate at another level. For example, you could use various data classification technologies to attach security labels to files and let the data classification software do the work of preventing files from crossing security levels. These technologies usually work best in conjunction with programs that have been updated to understand and enforce these data classification labels. (My guess is that they also use heuristics to detect and classify usage by legacy programs.)
¹ It would also generate false positives for files that are identical merely by coincidence. For example, every empty file would be flagged as a copy of every other empty file.
Windows 2000 Server had a feature called Single Instance Store which looked for identical files, but it operated only when the system was idle. It didn’t run during the copy operation. This feature was subsequently deprecated in favor of Data Deduplication, which looks both for identical files as well as identical blocks of files. Again, Data Deduplication runs during system idle time. It doesn’t run during the copy operation. The duplicate is detected only after the fact. (Note the terminology: It is a “duplicate” file, not a “copy”. Two files could be identical without one being a copy of the other.)
² And besides, even if the load-and-save method produces byte-for-byte identical files, somebody who wanted to avoid detection would just make a meaningless change to the document before saving it.
0 comments
Be the first to start the discussion.