If you can use GUIDs to reference files, why not use them to remember “recently used” files so they can survive renames and moves?

Raymond Chen

Raymond

You can ask for a GUID identifier for a file, and use that GUID to access the file later. You can even recover a (perhaps not the) file name from the GUID.

David Trapp wishes programs would use GUIDs to reference files so that references to recently used files can survive renames and moves.

Be careful what you wish for.

It is a common pattern to save a file by performing two steps.

  • Create a temporary file with the new contents.
  • Rename the original file to a *.bak or some other name.
  • Rename the temporary file to the original name.
  • (optional) Delete the *.bak file.

Programs use this multi-step process so that an the old copy of the file remains intact until the new file has been saved successfully. Once that’s done, they swap the new file into place.

Unfortunately, this messes up your GUID-based accounting system.

If you tracked the file by its GUID, then here’s what you see:

  • Create a temporary file, which gets a new GUID.
  • Rename the original file. It retains its GUID but has a new name.
  • Rename the temporary file file. It retains its GUID but has a new name.

The GUID that you remembered does not refer to the new file; it refers to the old file. Even worse, if the program took the optional step of deleting the renamed original, you now have a GUID that refers to a deleted file, which means that when you try to open it, the operation will fail.

Programs can avoid this problem by using the Replace­File function to promote the temporary file. The Replace­File function preserves the file identifier, among other things.

In practice, use of the Replace­File function is not as widespread as you probably would like, so using only GUIDs to track files will technically track the file, but may not track the file you intend. Because people still think of the file name as the identifier for a file, not its GUID.

Raymond Chen
Raymond Chen

Follow Raymond   

11 Comments
Avatar
David Walker 2019-06-12 08:01:21
I don't think he's proposing using the GUID as a file name.  Here's how this would be implemented: * Create a temporary file, which gets a new GUID (stored internally somewhere in the file) * Rename the original file * Rename the new file to have the original file name * Save the GUID from the original file * Delete the original file * Reassign the GUID in the new file (which has the original file name) to have the original GUID stored internally   What's so hard about that?  (It sounds similar to what ReplaceFile does, but this could be implemented even without using ReplaceFile.)
Henke37
Henrik Andersson 2019-06-12 08:55:59
Today Raymond shows that he can math by offering a two step procedure with four entries. And in a more serious tone, this seems like a thing that the usual hack for this exact situation should be accounting for. In addition to migrating the timestamp, why not the guid too? Oh well, it's probably too late to change, some program would probably segfault if this changed.
Avatar
Ji Luo 2019-06-12 16:52:56
If I recall correctly (which might not be the case), a previous version of the documentation of `ReplaceFile` function says that the object identifier is not preserved... Aha, there must be a bug in the documentation, it says "The replacement file assumes the name of the replaced file and its identity" and "... also preserves the following attributes of the original file: ... Object identifier" and "The resulting file has the same file ID as the replacement file". The replacement file is the temporary file (new file), and the replaced file is the original file. The last sentence contradicts the first two. Also, IIRC, the recommended way to implement MRU file list is to store a bunch of `.lnk`s, which already give you ID-based tracking (on NTFS volumes and if the policy allows).
Avatar
Joshua Schaeffer 2019-06-12 19:35:42
The GUID should just be a secret backup in case the filename isn't found. Every time you open the file, you also scan the latest GUID value for that filename. This keeps the user in charge while still getting the full benefit from the GUID. I never treat the GUID as a primary key for a user-managed file, in fact I would use the GUID to enforce the integrity of an internal application file AGAINST the user. The GUID index seems magical; is it expensive to manage?