January 8th, 2008

Taxes: Files larger than 4GB

Nowadays, a hard drive less than 20 gigabytes is laughably small, but it used to be that the capacity of a hard drive was measured in megabytes, not gigabytes. Today, video files and databases can run to multiple gigabytes in size, and your programs need be prepared for them. This means that you need to use 64-bit file offsets such as those used by the function SetFilePointerEx (or SetFilePointer if you’re willing to fight with the somewhat roundabout way it deals with the high 32 bits of the offset). It also means that you need to pay attention to the nFileSizeHigh of the WIN32_FIND_DATA structure. For example, if your program rejects files smaller than a minimum size, and I give you a file that is exactly four gigabytes, and you check only the nFileSizeLow, then you will think that the file is too small even though it is actually enormously huge.

More indirectly, it means that you can’t blindly map an entire file into memory. Many programs simplify file parsing by mapping the entire file into memory and then walking around the file using pointers. This breaks down on 32-bit machines once the file gets to be more than about a gigabyte and a half in size, since the odds of finding that much contiguous free address space is pretty low. You’ll have to map it in pieces or use some other parsing method entirely.

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

0 comments

Discussion are closed.

Feedback