If you can use GUIDs to reference files, why not use them to remember “recently used” files so they can survive renames and moves?
You can ask for a GUID identifier for a file, and use that GUID to access the file later. You can even recover a (perhaps not the) file name from the GUID.
David Trapp wishes programs would use GUIDs to reference files so that references to recently used files can survive renames and moves.
Be careful what you wish for.
It is a common pattern to save a file by performing two steps.
- Create a temporary file with the new contents.
- Rename the original file to a
*.bakor some other name.
- Rename the temporary file to the original name.
- (optional) Delete the
Programs use this multi-step process so that an the old copy of the file remains intact until the new file has been saved successfully. Once that’s done, they swap the new file into place.
Unfortunately, this messes up your GUID-based accounting system.
If you tracked the file by its GUID, then here’s what you see:
- Create a temporary file, which gets a new GUID.
- Rename the original file. It retains its GUID but has a new name.
- Rename the temporary file file. It retains its GUID but has a new name.
The GUID that you remembered does not refer to the new file; it refers to the old file. Even worse, if the program took the optional step of deleting the renamed original, you now have a GUID that refers to a deleted file, which means that when you try to open it, the operation will fail.
Programs can avoid this problem by using the
ReplaceFile function to promote the temporary file. The
ReplaceFile function preserves the file identifier, among other things.
In practice, use of the
ReplaceFile function is not as widespread as you probably would like, so using only GUIDs to track files will technically track the file, but may not track the file you intend. Because people still think of the file name as the identifier for a file, not its GUID.
I don’t think he’s proposing using the GUID as a file name. Here’s how this would be implemented:
* Create a temporary file, which gets a new GUID (stored internally somewhere in the file)
* Rename the original file
* Rename the new file to have the original file name
* Save the GUID from the original file
* Delete the original file
* Reassign the GUID in the new file (which has the original file name) to have the original GUID stored internally
What’s so hard about that? (It sounds similar to what ReplaceFile does, but this could be implemented even without using ReplaceFile.)
The concept of file IDs might not exist on the underlying file system, and different FSes might have different notions of IDs, which makes it difficult to present such an interface at the level of Win32 (they’re at a lower level than Win32). `ReplaceFile` (as well as other Win32 file APIs) abstracts away this.
My wish was more of hypothetical nature – it’s clear that in the current state of the ecosystem this won’t really work, but if an ID based concept had been more prominent from the start and filenames and paths had been just a user-facing display name/organizational structure – meaning that applications would just not work properly if they did that rename-delete thing without somehow preserving the ID – then possibly things could have ended up nicer. This would also have made hard links much more useful – I also wish for a simple painless way to organize my files and folders in structures that can have multiple dimensions without choosing one primary one, meaning that I could have several folder trees depending on which way I need to look at them but have the same files exist in multiple “folders”. (Yes, shortcuts would sorta work, but only if the windows shell is your primary point of interaction with the file system and not the command line, and it would then still be inconvenient for situations like using a “save as” dialog.) But I’m just dreaming here. I’m aware that systems, especially spanning a long time and many companies and other factors, evolve in their own way, and it’s always easy to say “today I’d design it from scratch in this and that superior way”. However, as Joshua Schaeffer said below, a nice solution that would work even now (and without depending on all other applications to do things “right”) would be to save filename and GUID in the LRU list and try to access the file via filename first, and if it fails then also via GUID (and then update the filename). Which is what shortcuts already do (as Ji Luo correctly pointed out on Twitter), but it’d be great if your average program would be doing the same, as I hardly use shortcuts in my workflow, I’m most of the time working with specific applications directly.
It always annoys me to hear a statement like “the concept of file IDs might not exist on the underlying file system”. Sure, the concept of file IDs might not exist on the underlying file system. But then again, it might! And if that concept exists on whatever file system you are using, then SUPPORT the implementation. If that concept does not exist on the underlying file system, then you can’t do this, and you can give a reasonable error message. But don’t program to the lowest common denominator. Many of us DO use the NTFS file system almost exclusively. (This sounds like “If Jimmy can’t have a new toy, then Susie can’t have one either”!)
Today Raymond shows that he can math by offering a two step procedure with four entries.
And in a more serious tone, this seems like a thing that the usual hack for this exact situation should be accounting for. In addition to migrating the timestamp, why not the guid too? Oh well, it’s probably too late to change, some program would probably segfault if this changed.
That’s what I was saying (although less elegantly). Sure, after renaming a file with all of the steps, then “migrade” the GUID too. If the underlying file system supports GUIDs. 🙂
If I recall correctly (which might not be the case), a previous version of the documentation of `ReplaceFile` function says that the object identifier is not preserved… Aha, there must be a bug in the documentation, it says “The replacement file assumes the name of the replaced file and its identity” and “… also preserves the following attributes of the original file: … Object identifier” and “The resulting file has the same file ID as the replacement file“. The replacement file is the temporary file (new file), and the replaced file is the original file. The last sentence contradicts the first two.
Also, IIRC, the recommended way to implement MRU file list is to store a bunch of `.lnk`s, which already give you ID-based tracking (on NTFS volumes and if the policy allows).
Regarding the recommended way to store a LRU list: I didn’t realize that – I only knew that this is what Windows uses in the global “last used documents” list (which I also don’t use simply because most of the time programs I use don’t write anything there and even if they do, the list is too short and ends up containing a bunch of TXT files I opened in the shell for example).
It sounds a bit cumbersome, is there something like “an in-memory LNK-style thing that you can very easily create from a file path or handle and store it however you like and pass it back into an API” too? Because I feel like one of the reasons that no program that I know of seems to handle their LRU list that way is that it just feels unnecessary complex to juggle LNK files in folders, especially when you would like to use one file (say, JSON) – or the registry even – to store your relevant state information…
Of course, use the `IPersistStream` interface of `CLSID_ShellLink`, as demonstrated in a previous entry. I believe Jump Lists are stored that way, too. But perhaps storing files isn’t that bad. You could store the MRU list in your AppData folder. This avoids cluttering user-visible folders as well as the search index.
The GUID should just be a secret backup in case the filename isn’t found. Every time you open the file, you also scan the latest GUID value for that filename. This keeps the user in charge while still getting the full benefit from the GUID. I never treat the GUID as a primary key for a user-managed file, in fact I would use the GUID to enforce the integrity of an internal application file AGAINST the user. The GUID index seems magical; is it expensive to manage?
Yes, this sounds like it would work even in the current state of affairs!