{"id":6563,"date":"2012-09-19T07:00:00","date_gmt":"2012-09-19T14:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/2012\/09\/19\/does-the-copyfile-function-verify-that-the-data-reached-its-final-destination-successfully\/"},"modified":"2012-09-19T07:00:00","modified_gmt":"2012-09-19T14:00:00","slug":"does-the-copyfile-function-verify-that-the-data-reached-its-final-destination-successfully","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20120919-00\/?p=6563","title":{"rendered":"Does the CopyFile function verify that the data reached its final destination successfully?"},"content":{"rendered":"<p>\nA customer had a question about data integrity via file copying.\n<\/p>\n<blockquote CLASS=\"q\"><p>\nI am using the\n<a HREF=\"http:\/\/msdn.microsoft.com\/en-us\/library\/c6cfw35a.aspx\">\n<code>File.Copy<\/code><\/a> to copy files from one server to another.\nIf the call succeeds, am I guaranteed that the data was copied\nsuccessfully?\nDoes the <code>File.Copy<\/code> method internally perform a file checksum\nor something like that to ensure that the data was written correctly?\n<\/p><\/blockquote>\n<p>\nThe\n<code>File.Copy<\/code> method uses the Win32\n<code>Copy&shy;File<\/code> function internally,\nso let&#8217;s look at <code>Copy&shy;File<\/code>.\n<\/p>\n<p>\n<code>Copy&shy;File<\/code> just issues <code>Read&shy;File<\/code> calls\nfrom the source file and <code>Write&shy;File<\/code> calls to the\ndestination file.\n(Note: Simplification for purposes of discussion.)\nIt&#8217;s not clear what you are hoping to checksum.\nIf you want <code>Copy&shy;File<\/code> to checksum the bytes when\nthe return from <code>Read&shy;File<\/code>, and checksum the bytes\nas they are passed to\n<code>Write&shy;File<\/code>, and then compare them at the end of\nthe operation, then that tells you nothing, since they are\nthe same bytes in the same memory.\n<\/p>\n<pre>\nwhile (...) {\n ReadFile(sourceFile, buffer, bufferSize);\n readChecksum.checksum(buffer, bufferSize);\n writeChecksum.checksum(buffer, bufferSize);\n WriteFile(destinationFile, buffer, buffer,Size);\n}\n<\/pre>\n<p>\nThe <code>read&shy;Checksum<\/code> and\n<code>write&shy;Checksum<\/code> are identical because they\noperate on the same bytes.\n(In fact, the compiler might even optimize the code by\nmerging the calculations together.)\nThe only way something could go awry is if you have flaky\nmemory chips that change memory values spontaneously.\n<\/p>\n<p>\nMaybe the question was whether <code>Copy&shy;File<\/code> goes\nback and reads the file it just wrote out to calculate\nthe checksum.\nBut that&#8217;s not possible in general, because you might not\nhave read access on the destination file.\nI guess you could have it do a checksum if the destination were\nreadable, and skip it if not, but then that results in a bunch\nof weird behavior:\n<\/p>\n<ul>\n<li>It generates\n    spurious security audits when it tries to read from the destination\n    and gets <code>ERROR_ACCESS_DENIED<\/code>.<\/li>\n<li>It means that <code>Copy&shy;File<\/code> sometimes does a checksum\n    and sometimes doesn&#8217;t, which removes the value of any checksum\n    work since you&#8217;re never sure if it actually happened.<\/li>\n<li>It doubles the network traffic for a file copy operation,\n    leading to weird workarounds from network administrators like\n    &#8220;Deny read access on files in order to speed up file copies.&#8221;<\/li>\n<\/ul>\n<p>\nEven if you get past those issues, you have an even bigger problem:\nHow do you know that reading the file back will really tell you\nwhether the file was physically copied successfully?\nIf you just read the data back, it may end up being read out of the\ndisk cache, in which case you&#8217;re not actually verifying physical media.\nYou&#8217;re just comparing cached data to cached data.\n<\/p>\n<p>\nBut if you open the file with caching disabled, this has the side\neffect of purging the cache for that file, which means that the\nsystem has thrown away a bunch of data that could have been useful.\n(For example, if another process starts reading the file at the same\ntime.)\nAnd, of course, you&#8217;re forcing access to the physical media, which is slowing\ndown I\/O for everybody else.\n<\/p>\n<p>\nBut wait, there&#8217;s also the problem of caching controllers.\nEven when you tell the hard drive, &#8220;Now read this data from the physical\nmedia,&#8221;\nit may decide to\n<a HREF=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2010\/09\/09\/10059575.aspx\">\nreturn the data from an onboard cache instead<\/a>.\nYou would have to issue a &#8220;No really, flush the data and read it back&#8221;\ncommand to the controller to ensure that it&#8217;s really reading from\nphysical media.\n<\/p>\n<p>\nAnd even if you verify that, there&#8217;s no guarantee that the moment you\ndeclare &#8220;The file was copied successfully!&#8221; the drive platter won&#8217;t\nspontaneously develop a bad sector and corrupt the data you just\ndeclared victory over.\n<\/p>\n<p>\nThis is one of those &#8220;How far do you really want to go?&#8221; type of questions.\nYou can re-read and re-validate as much as you want at copy time,\nand you\n<i>still<\/i> won&#8217;t know that the file data is valid when you finally\nget around to using it.\n<\/p>\n<p>\nSometimes,\nyou&#8217;re better off just trusting the system\nto have done what it says it did.\n<\/p>\n<p>\nIf you really want to do some sort of copy verification,\nyou&#8217;d be better off saving the checksum somewhere and having\nthe ultimate consumer of the data validate the checksum\nand raise an integrity error if it discovers corruption.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It just writes data.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[26],"class_list":["post-6563","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-other"],"acf":[],"blog_post_summary":"<p>It just writes data.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/6563","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=6563"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/6563\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=6563"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=6563"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=6563"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}