{"id":8813,"date":"2011-12-26T07:00:00","date_gmt":"2011-12-26T07:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/2011\/12\/26\/why-is-the-file-size-reported-incorrectly-for-files-that-are-still-being-written-to\/"},"modified":"2011-12-26T07:00:00","modified_gmt":"2011-12-26T07:00:00","slug":"why-is-the-file-size-reported-incorrectly-for-files-that-are-still-being-written-to","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20111226-00\/?p=8813","title":{"rendered":"Why is the file size reported incorrectly for files that are still being written to?"},"content":{"rendered":"<p>\nThe shell team often gets questions like these from customers:\n<\/p>\n<blockquote CLASS=\"q\"><p>\nAttached please find a sample program which continuously\nwrites data to a file.\nIf you open the folder containing the file in Explorer, you can see\nthat the file size is reported as zero.\nEven manually refreshing the Explorer window does not update the file size.\nEven the <code>dir<\/code> command shows the file size as zero.\nOn the other hand, calling <code>Get&shy;File&shy;Size<\/code> reports\nthe correct file size.\nIf I close the file handle, then Explorer and the <code>dir<\/code>\ncommand both report the correct file size.\nWe can observe this behavior on Windows Server 2008 R2,\nbut on Windows Server 2003, the file sizes are updated in both Explorer\nand <code>dir<\/code>.\nCan anybody explain what is happening?\n<\/p><\/blockquote>\n<blockquote CLASS=\"q\"><p>\nWe have observed that Windows gives the wrong file size for\nfiles being written.\nWe have a log file that our service writes to,\nand we like to monitor the size of the file by watching\nit in Explorer, but the file size always reports as zero.\nEven the <code>dir<\/code> command reports the file size as zero.\nOnly when we stop the service does the log file size get reported\ncorrectly.\nHow can we get the file size reported properly?\n<\/p><\/blockquote>\n<blockquote CLASS=\"q\"><p>\nWe have a program that generates a large number of files in the current\ndirectory.\nWhen we view the directory in Explorer, we can watch the files as they\nare generated, but the file size of the last file is always reported as zero.\nWhy is that?\n<\/p><\/blockquote>\n<p>\nNote that this is not even a shell issue.\nIt&#8217;s a file system issue,\nas evidenced by the fact that a <code>dir<\/code> command exhibits\nthe same behavior.\n<\/p>\n<p>\nBack in the days of FAT, all the file metadata was stored in the\ndirectory entry.\n<\/p>\n<p>\nThe designers of NTFS had to decide where to store their metadata.\nIf they chose to do things the UNIX way,\nthe directory entry would just be a name\nand a reference to the file metadata (known in UNIX-land as an <i>inode<\/i>).\nThe problem with this approach is that every directory listing would require\nseeking all over the disk\nto collect the metadata to report for each file.\nThis would have made NTFS slower than FAT at listing the contents\nof a directory, a rather embarrassing situation.\n<\/p>\n<p>\nOkay, so some nonzero amount of metadata needs to go into the\ndirectory entry.\nBut NTFS supports hard links, which complicates matters\nsince a file with multiple hard links has multiple directory entries.\nIf the directory entries disagree, who&#8217;s to say which one is right?\nOne way out would be try very hard to keep all the directory entries\nin sync and to make the <code>chkdsk<\/code> program arbitrary choose\none of the directory entries as the &#8220;correct&#8221; one in the case a conflict\nis discovered.\nBut this also means that if a file has a thousand hard links, then\nchanging the file size would entail updating a thousand directory entries.\n<\/p>\n<p>\nThat&#8217;s where the NTFS folks decided to draw the line.\n<\/p>\n<p>\nIn NTFS, file system metadata is a property not of the directory\nentry but rather of the <i>file<\/i>,\nwith some of the metadata replicated into the directory entry as a\ntweak to improve directory enumeration performance.\nFunctions like\n<code>Find&shy;First&shy;File<\/code> report the directory entry,\nand by putting the metadata that FAT users were accustomed to getting\n&#8220;for free&#8221;, they could avoid being slower\nthan FAT for directory listings.\nThe directory-enumeration functions report the last-updated metadata,\nwhich may not correspond to the actual metadata if the directory entry\nis stale.\n<\/p>\n<p>\nThe next question is where and how often this metadata replication is done;\nin other words, how stale is this data allowed to get?\nTo avoid having to update a potentially unbounded number of\ndirectory entries each time a file&#8217;s metadata changed, the NTFS folks\ndecided that the replication would be performed only from the file into\n<i>the directory entry that was used to open the file<\/i>.\nThis means that if a file has a thousand hard links,\na change to the file size would be reflected in the directory entry\nthat was used to open the file, but the other 999 directory entries\nwould contain stale data.\n<\/p>\n<p>\nAs for how often, the answer is a little more complicated.\nStarting in Windows Vista (and its corresponding Windows Server version\nwhich I don&#8217;t know but I&#8217;m sure you can look up,\nand by &#8220;you&#8221; I mean &#8220;Yuhong Bao&#8221;),\nthe NTFS file system performs this courtesy replication when the\nlast handle to a file object is closed.\nEarlier versions of NTFS replicated the data while the file\nwas open whenever the cache was flushed, which meant that it happened\nevery so often according to an unpredictable schedule.\nThe result of this change is that the directory entry now gets updated\nless frequently, and therefore the last-updated file size is more\nout-of-date than it already was.\n<\/p>\n<p>\nNote that even with the old behavior, the file size was still\nout of date (albeit not as out of date as it is now),\nso any correctly-written program already had to accept the possibility\nthat the actual file size differs from the size reported by\n<code>Find&shy;First&shy;File<\/code>.\nThe change to suppress the &#8220;bonus courtesy updates&#8221; was made for\nperformance reasons.\nObviously, updating the directory entries results in additional I\/O\n(and forces a disk head seek),\nso it&#8217;s an expensive operation for relatively little benefit.\n<\/p>\n<p>\nIf you really need the actual file size right now, you can do what\nthe first customer did and call <code>Get&shy;File&shy;Size<\/code>.\nThat function operates on the actual file and not on the directory entry,\nso it gets the real information and not the shadow copy.\nMind you, if the file is being continuously written-to,\nthen the value you get is already wrong the moment you receive it.\n<\/p>\n<p>\nWhy doesn&#8217;t Explorer do the\n<code>Get&shy;File&shy;Size<\/code> thing when it enumerates the contents\nof a directory so it always reports the accurate file size?\nWell, for one thing, it would be kind of presumptuous of Explorer to\nsecond-guess the file system.\n&#8220;Oh, gosh, maybe the file system is lying to me.\nLet me go and verify this information via a slower alternate mechanism.&#8221;\nNow you&#8217;ve created this environment of distrust.\nWhy stop there?\nWhy not also verify file contents?\n&#8220;Okay, I read the first byte of the file and it returned 0x42, but I&#8217;m\nnot so sure the file system isn&#8217;t trying to trick me, so after reading\nthat byte, I will open the volume in raw mode, traverse the file system\ndata structures, and find the first byte of the file myself,\nand if it isn&#8217;t 0x42, then somebody&#8217;s gonna have some explaining to do!&#8221;\nIf the file system wants to lie to us,\nthen <i>let the file system lie to us<\/i>.\n<\/p>\n<p>\nAll this verification takes\nan operation that could be done in\n2 + <i>N<\/i>\/500 I\/O operations\nand slows it down to\n2 + <i>N<\/i>\/500 + 3<i>N<\/i> operations.\nAnd you&#8217;re reintroduced all the disk seeking\nthat all the work was intended to avoid!\n(And if this is being done over the network,\nyou can definitely feel a 1500&times; slowdown.)\nCongratulations, you made NTFS slower than FAT.\nI hope you&#8217;re satisfied now.\n<\/p>\n<p>\nIf you were paying close attention, you&#8217;d have noticed that I wrote\nthat the information is propagated into the directory when the last handle\nto the <i>file object<\/i> is closed.\nIf you call <code>Create&shy;File<\/code> twice on the same file,\nthat creates two file objects which refer to the same underlying file.\nYou can therefore trigger the update of the directory entry from another\nprogram by simply opening the file and then closing it.\n<\/p>\n<pre>\nvoid UpdateFileDirectoryEntry(__in PCWSTR pszFileName)\n{\n    HANDLE h = CreateFileW(\n        pszFileName,\n        0,                  \/\/ don't require any access at all\n        FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,\n        NULL,               \/\/ lpSecurityAttributes\n        OPEN_EXISTING,\n        0,                  \/\/ dwFlagsAndAttributes\n        NULL);              \/\/ hTemplateFile\n    if (h != INVALID_HANDLE_VALUE) {\n        CloseHandle(h);\n    }\n}\n<\/pre>\n<p>\nYou can even trigger the update from the program itself.\nYou might call a function like this every so often\nfrom the program generating the output file:\n<\/p>\n<pre>\nvoid UpdateFileDirectoryEntry(__in HANDLE hFile)\n{\n    HANDLE h = ReOpenFile(\n        hFile,\n        0,                  \/\/ don't require any access at all\n        FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,\n        0);                 \/\/ dwFlags\n    if (h != INVALID_HANDLE_VALUE) {\n        CloseHandle(h);\n    }\n}\n<\/pre>\n<p>\nIf you want to update all file directory entries (rather than a specific\none), you can build the loop yourself:\n<\/p>\n<pre>\n\/\/ functions ProcessOneName and EnumerateAllNames\n\/\/ <a HREF=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2011\/07\/20\/10188033.aspx\">incorporated by reference<\/a>.\nvoid UpdateAllFileDirectoryEntries(__in PCWSTR pszFileName)\n{\n    EnumerateAllNames(pszFileName, UpdateFileDirectoryEntry);\n}\n<\/pre>\n<p>\nArmed with this information, you can now give a fuller explanation of\n<a HREF=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2011\/08\/12\/10195186.aspx#10195204\">\nwhy <code>Read&shy;Directory&shy;ChangesW<\/code> does not\nreport changes to a file until the handle is closed<\/a>.\n(And why it&#8217;s not a bug in\n<code>Read&shy;Directory&shy;ChangesW<\/code>.)\n<\/p>\n<p>\n<b>Bonus chatter<\/b>:\nMind you, the file system could expose a flag to\na <code>Find&shy;First&shy;File<\/code>-like function that\nmeans &#8220;Accuracy is more important than performance;\nreturn data that is as up-to-date as possible.&#8221;\nThe NTFS folks tell me that implementing such a flag wouldn&#8217;t be\nall that hard.\nThe real question is whether anybody would bother to use it.\n(If not, then it&#8217;s a bunch of work for no benefit.)\n<\/p>\n<p>\n<b>Bonus puzzle<\/b>:\nA customer observed that whether the\nfile size in the directory entry was being updated\nwhile the file was being written depended on what\ndirectory the file was created in.\nCome up with a possible explanation for this observation.<\/p>\n<p>\n<b>Bonus reading<\/b>:\n<\/p>\n<ul>\n<li>\n<a HREF=\"http:\/\/blogs.msdn.com\/b\/ntdebugging\/archive\/2008\/07\/03\/ntfs-misreports-free-space.aspx\">\nNTFS misreports free space?<\/a><\/p>\n<li>\n<a HREF=\"http:\/\/blogs.technet.com\/b\/askcore\/archive\/2009\/10\/16\/the-four-stages-of-ntfs-file-growth.aspx\">\nThe four stages of NTFS file growth<\/a>.\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>The shell team often gets questions like these from customers: Attached please find a sample program which continuously writes data to a file. If you open the folder containing the file in Explorer, you can see that the file size is reported as zero. Even manually refreshing the Explorer window does not update the file [&hellip;]<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[104],"class_list":["post-8813","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-tipssupport"],"acf":[],"blog_post_summary":"<p>The shell team often gets questions like these from customers: Attached please find a sample program which continuously writes data to a file. If you open the folder containing the file in Explorer, you can see that the file size is reported as zero. Even manually refreshing the Explorer window does not update the file [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/8813","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=8813"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/8813\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=8813"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=8813"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=8813"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}