The nNumberOfBytesToRead
parameter to ReadFile
is a 32-bit unsigned integer, which limits the number of bytes that could be read at once to 4GB. What if you need to read more than 4GB?
The ReadFile
function cannot read more than 4GB of data at a time. At the time the function was originally written, all Win32 platforms were 32-bit, so reading more than 4GB of data into memory was impossible because the address space didn’t have room for a buffer that large.
When Windows was expanded from 32-bit to 64-bit, the byte count was not expanded. I don’t know the reason for certain, but it was probably a combination of (1) not wanting to change the ABI more than necessary, so that it would be easier to port 32-bit device drivers to 64-bit, and (2) having no practical demand for reading that much data in a single call.
You can work around the problem by writing a helper function that breaks the large read into chunks of less than 4GB each.
But reading 4GB of data into memory seems awfully unusual. Do you really need all of it in memory at once? Maybe you can just read the parts you need as you need them. Or you can use a memory-mapped file to make this on-demand reading transparent. (Though at a cost of having to deal with in-page exceptions if the read cannot be satisfied.)
I came up with one and only one use case for this. Reading in an entire 4.7GB image into RAM (with locked pages) in advance of DVD burn for a 0% chance of a coaster, and reading the whole image into RAM wouldn’t be necessary if there were a ramdisk driver available from Microsoft.
Only 4.7 GB ISO? Try 6.66 GB for en-us_windows_11_consumer_editions_version_24h2_updated_july_2025_x64_dvd_a1f0681d.iso.
Also, what if you are processing a medical imaging study file which contains say 720 frames of 16-bit raw data taken at 0.5 degrees steps around the patient with X-ray camera and doing backprojection on it to turn it into a 3D view which you can slice? Sure you can use blocking but even M.2 SSD loads are 10x slower than RAM so you want it all in RAM.
That's just one example, and I am sure there are many other uses for loading all data you can into RAM before crunching numbers...
@Raymond Or Microsoft could just write a wrapper around ReadFile (if they can't be bothered or if it isn't possible to implement a full 64-bit read) and name it ReadFile64 or ReadFileEx2 or something so everyone who needs it doesn't have to reinvent the wheel?
Perhaps something like this could work?
<code>
Note that it reads large files in chunks and fails if a chunk read fails (and it also doesn't return how many bytes have been read in either case, left as an exercise for the readers).
@Igor Levicki, it sounds to me like you have a solution looking for a problem. You believe you need to load multi-GB files into memory for some reason and you feel this is trivial to do and therefore the OS should provide it. I still don't see a single use case you mentioned where it makes sense. Video ram? That is what video cards are for and they support streaming (or whatever) data into buffers. You don't need to load all the data from a file into memory. That is, to me, an old school thought process. Compressed data? We...
The whole concept of API is a "solution looking for a problem." You create a complete and wholesome API and see what great apps and solutions people build. That's how Windows became what it is today.
But there is a clear reason for ReadFile(Ex)(2) to take SIZE_T instead of DWORD:
Modern 64-bit C++ software uses std::size_t for all size parameters, which is 64-bit. The software is built with many layers. The last layer, which may be a third-party library or you might not even own it, passes the size argument to ReadFile. And see GitHub, NOBODY checks for overflow when passing std::size_t...
Identify a reasonable use case where reading a large file like this actually makes sense? Bear in mind that other options like memory mapped files already solve this problem, are more efficient and don’t require the machine to have at least 4 GB of available memory to do this anyway. Outside of a very specific type of Server app I cannot think of a single case where this would make sense. It seems overly wasteful of resources.
The fact that something can be done doesn’t mean it should be done.
@Antonio Rodríguez
Sorry, but you are wrong. Just because it is not *strictly* required doesn't mean it wouldn't make life much easier. Also you are grossly underestimating sizes (even in small sections we are already way past hundreds of MBs and getting over 4GB is pretty easy too ) and how much of video may need to be accessed.
(Also it would allow better resource planning for system)
BTW: Why are you talking about frames, when use case I mentioned is whole video processing? (be it encoding, denoising,...)
Also why is everybody assuming read or write are blocking and on GUI thread?
(Not...
@Igor Levicki
> 4K video frame is 3840×2160 pixels, if we are talking about simple 8-bit RGBA that’s already 31.64 MB for a single frame
Yes, those maths are right. And these are the reasons nobody works with uncompressed video: a typical 90-minute movie in 4K uncompressed 24 FPS video would be 4.1 TB (or 16 TB for 8K video, again as you point). Bigger than many SSDs, let alone the RAM needed to load the whole file at once as you are asking.
The access speeds you are giving are maybe a half of what I said, right. But those...
@Igor If you are reading compressed data to RAM, it makes much more sense to read it in smaller chunks and decompress chunk-by-chunk.
And in general, operating systems do not provide every possible helper routine someone might need. If it can be readily done in user space, you can publish a 3rd party library for anything that is too complex to just reuse as a code snippet. The classic Windows API in particular does not have many helpers, and higher level languages usually already have methods for reading more than 4 GB.
With memory mapping, you get parallel processing for free. The...
I am putting all replies here as this stupid website doesn't allow replying to arbitrary depth of threaded conversation (and is still sending me email notifications even though I disabled that).
> Bear in mind that other options like memory mapped files already solve this problem, are more efficient...
I don't understand where you all are getting the idea that memory mapped is more efficient for dealing with large datasets? You memory map a COMPRESSED (in 99% of cases) file and then what? You still need UNCOMPRESSED data to work on, don't you? What if you need this data in video RAM?...
> Identify a reasonable use case where reading a large file like this actually makes sense?
Restoring some large physical simulation or LLM training from snapshot after failure.
@Danielix Klimax Video processing does not need to have the whole file in memory. Video is processed in a frame-by-frame basis. You could argue that it would be useful to load a whole chunk between two keyframes, but even that is rarely more than a few MB.
The same can be applied to every multi-GB format I can think of (database, data set, hi-res audio...). Even mapping the whole file in memory seems a waste of address space and resources (page table entries, on-demand I/O).
This is one of those issues where if you have to ask "what is the limit?" you...
Video processing (file or pipe).
From another angle, 4 GiB of data will take a long time to transfer; the fastest SSD I can find would take 250 milliseconds to read 4 GiB of data, while reading it over 400G Ethernet is going to take you on the order of 100 ms. Even just copying it around in DRAM is going to hurt you; a high end server CPU is able to do about 500 GiB/s memory throughput, so that's still 8 ms to read 4 GiB, or 16 ms to copy it.
You get 1 million clock cycles per gigahertz of CPU clock speed in...
So you are saying that reading an 8GB ISO will take what, 0.5 sec? And that's a problem how? I don't get the argument about clock cycles? It's not like CPU will be busy doing the reads -- there's DMA and queues for that which you set and forget until its done.
Also, what's the point in doing chunked read as you suggest if you want to say do a file checksum? If you read 8 GB file in 256 MB blocks and get to 51% and next read fails you just wasted a lot of compute cycles calculating hash of...
Hanging a process for 500 milliseconds is an eternity in the multi-gigahertz era. And if you are reading off a traditional hard drive (with big data volumes, it is reasonable: a 30 TB SSD is prohibitively expensive) things go south easily. With a typical 120 MB/s SATA hard drive, loading 8 GB takes 70 seconds (or 70,000 milliseconds). More than a minute hanged is more than enough for Windows to show the “This program does not respond” and the user to force-close it. I’d say you wouldn’t want that.
The point about CPU cycles is that you have a lot of CPU cycles to use up on each read, if you're working in 4 GiB chunks; the additional CPU cost of chunking your requests to the API into 4 GiB chunks (instead of issuing a single 20 GiB read to the OS) is negligible compared to the time (measured in CPU cycles) it takes to transfer 4 GiB of data.
In addition, all you're likely to be doing is causing the OS to do chunked reads for you; nothing I've got access to (NVMe, SCSI, or SATA) can actually do...
It strikes me that this could have been a very deliberate decision. My hunch is that Memory mapping a file is more efficient for large reads. So the goal here is to discourage people from doing something better handled using that. Obviously not having access to the NTKernel source this is wholly supposition however. It could go through the same routines.
We recommend that you measure it ather than relying on intuition.
I doubt it, probably there is some legacy 32-bit value path passing through the filesystem driver stack which is a limiting factor given how reading >4GB wasn’t possible on filesystems of the era when ReadFile/WriteFile were written.
Also, I am not getting why memory mapping would be more efficient. You still have to page it in to read it and you can get exceptions to boot. If you need to read >4GB and you have enough RAM then there’s no reason not to.
Adding to Kalle Niemitalo‘s remarks: The SMB and NFS protocols use 32-bit read/write lengths. (AFP supports 64-bit lengths, it seems.)
In the Windows kernel, read lengths are 32-bit ULONG in NtReadFile, IO_STACK_LOCATION, and FLT_PARAMETERS. Likewise MDL::ByteCount. However IO_STATUS_BLOCK uses ULONG_PTR and could support 64-bit lengths.
I suppose it would be possible to define some IOCTL or FSCTL for reading larger amounts of data, without going through the normal IRP_MJ_READ. This would require a new SupportedFeatures bit so that the new interface is only used if all minifilters support it. But I doubt there is business justification for such a change.
In vast majority of cases, user code is not going to process all of it at once and it is likely to be spaced out over time. The only exception I can see would be video processing.