December 8th, 2009

Looking at Virtual Memory Usage

Brian Harry
Corporate Vice President

One of the big problems that I’ve talked about is virtual memory exhaustion and the resulting VS instability.  Today, VS is a 32-bit process – I’m sure that’s going to change (become 64-bit) at some point but it won’t for 2010.  A 32-bit process has 2GB of address space on a 32-bit OS.  You can throw the 3GB switch and get another GB at the cost of taking a GB away from the OS – it works OK but the OS doesn’t always like it – particularly if you are running server components on your machine (like IIS).  If you are on a 64-bit OS, 32-bit processes get a full 4GB of virtual address space without any penalty to the OS – handy eh?

In the end, many customers are still running on 32-bit OSes and asking them to go change boot parameters (the 3GB switch) just isn’t an appropriate thing for us to do, so we’re focusing on making sure VM works well in the 2GB of address space that many customers have available.

This can be complicated by Virtual Address space fragmentation.  This is caused by smallish gaps between other virtual address space allocation.  You can reach the point where there’s enough virtual address space left but there’s not enough contiguous space to satisfy allocations.  So saying everything is OK as long as virtual memory allocation stays below 2GB really isn’t realistic.  The app will become unstable well before that.

Further, any test we do has to take into account that users will always do some things we didn’t expect so running VM all the way up to the threshold and saying “good enough” just isn’t acceptable either.

So what do you do?

We’ve started by gathering some “real world apps”.  By this I mean some of you, our customers, have actually given us copies of your large apps.  We’re incredibly grateful for this because they are great tools to ensure that we are staying in touch with the kinds of apps our customers are building and how VS is working for them.  Oh, and we’re using a couple of internal MS apps as well.  We’ve chosen 6 of them (apps of various types: web, WPF, Sharepoint, etc) that are all large and we believe characterize a large portion of our customer base.

We then choose a set of operations – load the solution, edit some code, design some forms, debug, checkin some changes, etc, etc that represent a realistic end user interaction with each application.

And lastly, we’ve chosen a threshold or 1.5GB of VM.  We will not consider it acceptable until all 6 of the scenarios are able to execute without ever going over 1.5GB of VM.  We believe this will allow adequate room for users to do things we didn’t anticipate and still have a buffer before getting near the instability threshold.

Here’s a picture of where we currently stand with the 6 apps (btw, I’ve changed the names of the apps to protect the identity of people we are working with).  Most of these apps were RED when we started the VM effort a couple of months ago.  As you can see we’ve made some good progress but we aren’t done yet.

image

Let’s take a look at some detail on one of the “bad ones”.  Here you can see the way we are tracking this.  You can see the “steps” in the scenario, the virtual memory after each step and the comparison to VS 2008 SP1 (for all steps where there is an equivalence).  You’ll see there are more bars here than steps listed, I think that’s an artifact of the slideware (not all steps are listed).  We use this to look for any places where VM jumps significantly or varies substantially from VS 2008 to focus investigations.

image

We also plot the VM growth from different builds against each other.  In this case, you can see that in the Nov 11th build (21111) the CB died halfway through the scenario because it ran out of VM and you can see that by Nov 25th (21125) we had improved it so that that not only did it complete but it is generally better than VS 2008 (let us remember that VS 2008 isn’t flawless so we don’t blindly take parity with it as success).  As you can see we still have a bit of work left to do to hit out 1.5GB goal.

image

We also track improvements over time from build to build in more of a trend fashion:

image

And lastly we use a “ledger” format for tracking fine grained progress.  This is the best way I’ve found to track this kind of thing: Identify the goal and current status against it.  Identify the fixed currently in progress and the expected improvements (generally an informed guess) and identify the areas that are under investigation (the things that just don’t look right) and best wild ass guess you can make about how much improvement “ought” to be available based on what it is vs what you think it should be.  This gives you a way to see things moving through the pipeline, understand progress towards the goal and make priority trade-offs.

image

Hopefully that gives you some insight into how we are thinking about the VM issue and the progress we are making.

Brian

Author

Brian Harry
Corporate Vice President

Corporate Vice President for Cloud Developer Services.

0 comments