What's the difference between SafeArrayAccessData and SafeArrayAddRef?

Once upon a time, there was SAFEARRAY, the representation of an array used by IDispatch and intended for use as a common mechanism for interchange of data between native code and scripting languages such as Visual Basic. You used the SafeArrayCreate function to create one, and a variety of other functions to get and set members of the array.

On the native side, it was cumbersome having to use functions to access the members of an array, so there is also the SafeArrayAccessData function that gives you a raw pointer to the array data. This also locks the array so that the array cannot be resized while you still have the pointer, because resizing the array could result in the memory moving. The idea here is that you lock the data for access, do your bulk access, and then unlock it. As an additional safety mechanism, an array cannot be destroyed while it is locked.

This was the state of affairs for a while, until the addition of the SafeArrayAddRef function in the Window XP timeframe. I don’t know exactly the story, but from the remarks in the documentation, it appears to have been introduced to protect against malicious scripts.

Suppose you’re writing a scripting engine, and a script performs an operation on an array. Your scripting engine represents this as a SAFEARRAY, and your engine starts operating with the array. You then issue a callback back into the script (for example, maybe you are the native side of a for_each-type function), and inside the callback, the script tries to destroy the array. After the callback returns, you have a use-after-free vulnerability in the scripting engine because it’s operating on an array that has been destroyed.

You could update the scripting engine to perform a SafeArrayAccessData on the array, thereby locking it and preventing the array from being resized or destroyed while the native code is using it. But that also means that the callback won’t be able to, say, append an element to the array. The script that the callback is running would encounter a DISP_E_ARRAYISLOCKED error when trying to append. If your scripting engine ignores errors from SafeArrayReDim, then the script’s attempt to extend the array silently fails, and that will probably break the internal script logic. As for destruction, if your script engine ignores errors from SafeArrayDestroy, then the array will be leaked. But if your scripting engine meticulously checks for those errors, then the script will get an unexpected exception.

For a failure to destroy a locked array, I guess the scripting engine could put the array in a queue of arrays whose destruction has been deferred, and then, I guess, check every few seconds to see if the array is safe to destroy? But for a failure to extend a locked array, the scripting engine is kind of stuck. It can’t “try again later” because the script expects the appended element to be present.

To solve the problem while creating minimal impact upon existing code, the scripting team invented SafeArrayAddRef. This is similar to SafeArrayAccessData in that it returns you a raw pointer to the array data, but it does not lock the array object. The array object can still be resized or destroyed successfully, thereby preserving existing semantics. What it does is add a reference to the array data (the same data that you received a pointer to). Only when the last reference is released is the data freed.

For a resize, that means that new memory is allocated, and the values are copied across, but the old memory is not freed until a corresponding number of SafeArrayReleaseData and SafeArrayReleaseDescriptor calls have been made. (The AddRef adds a reference to both the data and descriptor, and you have to release both of them in separate calls.)

Note that even though the memory is not freed, it is nevertheless zeroed out. This avoids problems with objects that have unique ownership like BSTR. If the memory hadn’t been zeroed out, then when the array is resized, there would be two copies of the BSTR, one in the new array data, and an abandoned one in the old array data. The code that called SafeArrayAddRef still has a pointer to the old array data. The new resized data might change the BSTR, which frees the old string, but the old data still has the BSTR and will result in a use-after-free.

Next time, a brief digression, before we use this information to answer a customer question about SafeArrayAddRef.

Bonus chatter: Note however that if the code that called SafeArrayAddRef writes to the old data, the any data in that memory block is not cleaned up. So don’t write a BSTR or Unknown or anything else that requires cleanup, because nobody will clean it up. (This is arguably a design flaw in SafeArrayAddRef, but what’s done is done, and you have to deal with it.)

What’s the difference between `SafeArrayAccessData` and `SafeArrayAddRef`?

Author

0 comments

Leave a commentCancel reply

Read next

A digression on the design and implementation of `SafeArrayAddRef` and extending APIs in general

Why did I lose the data even though I called `SafeArrayAddRef`?