Some time ago, I discussed the technique of reserving a block of address space and committing memory on demand. In the code, I left the exercise
// Exercise: What happens if the faulting memory access
// spans two pages?
As far as I can tell, nobody has addressed the exercise, so I’ll answer it.
If the faulting memory access spans two pages, neither of which is present, then an access violation is raised for one of the pages. (The processor chooses which one.) The exception handler commits that page and then requests execution to continue.
When execution continues, it tries to access the memory again, and the access still fails because one of the required pages is missing. But this time the faulting address will be an address on the missing page.
In practice, what happens is that the access violation is raised repeatedly until all of the problems are fixed. Each time it is raised, an address is reported which, if repaired, would allow the instruction to make further progress. The hope is that eventually, you will fix all of the problems,¹ and execution can resume normally.
Bonus chatter: For the x86-64 and x86-32 instruction sets, I think the most number of pages required by a single instruction is six, for the movsw instruction. This reads two bytes from es:rsi/esi, and writes them to ds:rdi/edi. If both addresses straddle a page, that’s four data pages. And the instruction itself is two bytes, so that can straddle two code pages, for a total of six. (There are other things that could go wrong, like an LDT page miss, but those will be handled in kernel mode and are not observable in user mode.)
Bonus exercises: I may as well answer the other exercises on that page. We don’t have to worry about integer overflow in the calculation of sizeof(WCHAR) * (Result + 1) because we have already verified that Result is in the range [1, MaxChars), so Result + 1 ≤ MaxChars, and we also know that MaxChars = Buffer.Length / sizeof(WCHAR), so multiplying both sides by sizeof(WCHAR) tells us that sizeof(WCHAR) * (Result + 1) ≤ Buffer.Length.
For the final exercise, we use CopyMemory instead of StringCchCopy because the result may contain embedded nulls, and we don’t want to stop copying at the first null.
¹ Though it’s possible that your attempt to fix one problem may undo a previous fix, putting you into an infinite cycle of repair.
> This reads two bytes from es:rsi/esi, and writes them to ds:rdi/edi.
The segment registers are reversed. For x86-32 it’s `movsw es:[edi], ds:[esi]` i.e. from DS:ESI to ES:EDI.
Regarding the linked article about alignment and page faults on bank switched graphic cards – how buffers with 24 bits of days per pixel were handled? Or those modes were just unsupported?
The issue is that since 64k bytes is not divisible by 3, and you usually need a pixel granularity if you aren’t using some kind of buffering…
I can't find a card out there that does packed 24 bit per pixel (as opposed to 32 bit per pixel with 8 unused bits, or separate R, G and B planes), and that also doesn't support either linear addressing of the entirety of VRAM or two independent windows into VRAM.
I therefore suspect that cards like you describe weren't handled because Microsoft never encountered them.
Note that x86 processors can't do pixel granularity accesses to 24 bit packed pixels - they offer 8 bit and 16 bit accesses up until the 80286, 32 bit in the 80386, and 64 bit added...
The AVX2 versions have a comment "A fault exits the instruction" (VGATHERDPS, VPGATHERDD), which is missing from the AVX512 versions. I'm not certain the instructions can be restarted, in any version. There is no architecturally-visible state changing while the instruction is running. In fact, the docs say "A given implementation of this instruction is repeatable - given the same input values and architectural state, the same set of elements to the left of the faulting one will be gathered" implying that it will re-gather the elements it had previously gathered and therefore all 32 pages must be present. ng.
Interestingly, the...
@Fabian on VGATHER: That’s hilarious. Though technically, it’s not 34 simultaneously present pages. You presumably need only 4 simultaneously present pages (2 for the instruction, and 2 for the data being gathered). So you could cycle through the data pages 2 at a time and eventually finish. The funny thing about movsw is that you need 6 pages to be present simultaneously!
As of the introduction of VGATHER vector instructions (AVX2) and VSCATTER (AVX-512), the max number of page faults caused by a single x86 instruction has gone way up!
These use the same strategy as the REP MOVS/CMPS/SCAS etc. family of instructions: they are carefully specified so they can make partial progress up to the location of the first page fault, and then save their current state to registers so they can resume from there (rather than retrying from the beginning). Namely, they update RDI/RDI/RCX with the current source/dest pointers and remaining count "as they go". They don't literally increment it every...
I did use SVGAs with bank switching in the mid-90s with 24-bit pixels and the answer is that bank switches could and did happen in the middle of pixels.
In practice, in the ISA days, the bus was 8-bit anyway. As long as you wrote 24-bit pixels with 3 byte writes, no problem. If you tried to do unaligned 2- or 4-byte writes, yes, you couldn’t do that while crossing 64k boundaries (that would require bank switching) and expect it to work.
> As far as I can tell, nobody has addressed the exercise, so I’ll answer it.
I suspect the exercise was previously answered in the comments, but those comments have long since been deleted. It's true that there are no comments on that 2012 blog post now, but I do recall that this blog has gone through multiple backend migrations over its long history, and at one point all of the old comments were deleted. All of the other blog posts from that era have no comments on them as well. The oldest snapshot on The Wayback Machine...
Wasn’t that all originally explained by Mr. Chen himself when it happened? He might have access to the archived comments that can’t be published.
The original blog post had 14 comments. Unfortunately it doesn’t look like Wayback was able to capture the comments due to how the old blog site scripting worked.
Wayback Link to old blogs.msdn.com
"or the x86-64 and x86-32 instruction sets, I think the most number of pages required by a single instruction is six" --- depends on if you count e.g. "rep movsw" as a single instruction or treat it as multiple instructions where rip doesnt change until rcx is zero; theoretically you can have nearly unlimited page faults with that one; the CPU trace flag triggers after every single data transfer so I'd say it's not a single instruction
the trickier question is now... what's the order of the 6 page faults with "movsw"... code first of course... then data... but then load/store...