How much data can you put on VSOnline?
I don’t get asked that question too often but I do occasionally and, as the service matures, I know I’ll get asked it more and more so it’s been on my mind. I was looking at some data yesterday about some of our largest tenants. No, I wasn’t looking at any of their IP (I can’t) but I was looking at some meta-data to understand usage patterns so we can plan ahead to make sure the service provides a good experience as tenants grow.
So far, no customer has hit any limit on how much they can store in VSOnline but there are limits and I keep wondering how to help people understand what they are so they can think about them in their planning. For the purpose of this conversation there are 2 main kinds of storage that you use:
1) Blob store – this is the size of the files, attachments, etc that are stored on the service. The files are compressed so that affects the size. The blob store is, for all intents and purposes unlimited (though we may from time to time impose limits to prevent abuse). Legitimate use is basically unlimited.
2) Meta-data store – Metadata (version control version info, work item records, test execution results, etc) are stored in a SQL Azure database. Today the limit on a SQL Azure database is 150GB. That’s a hard limit that we live with. SQL Azure has a road map for increasing that and we are also working with them to get compression support (our stuff compresses incredibly well) so I don’t see this being a big issue for anyone anytime soon but it’s always on my mind.
So the question I’ve struggled with is how do I answer the question “How much data can I put in VSOnline?” No one is ever going to be able to wrap their head around what the 150GB meta-data limit means. So I tend to think that people most easily relate to the size of their source code/documents/attachments and everything else kind of works out in the wash. Of course usage patterns can vary and you may have a very large number of work items or test results compared to others but so far, it’s the best measure I’ve been able to come up with.
So as I was looking at the data yesterday, here’s what I found about our largest tenant to date:
260GB compressed blob store – I usually estimate about a 3X compression ratio (varies depending on how much source vs binary you check in but, on average it’s pretty close). So that’s about 780GB of uncompressed data.
11GB of meta data – So, that puts them about 7% of the way to the limit on meta-data size – plenty of headroom there.
So if I extrapolate to how much data they could store before hitting the meta-data limit, I get: 150GB/11GB * 780GB = 10.5TB. That’s a pretty promising number! There aren’t many orgs that have that much development data to store.
So, the next question on my mind was whether or not the blob to meta data ratio was consistent across tenants. In other words, can everyone get this much data in or do usage patterns vary enough that the results are significantly different. I as you might imagine the answer is yes, they do vary a lot. I looked at a number of other larger tenants and I found ratios varied between about 5 and 23 (turns out the largest tenant also had the largest ratio). So if I take the most conservative number and do the same extrapolation, I get 2.2TB.
So right now, the best I can say is today you can put in between 2.2TB and 10.5TB depending on usage patterns. Either way it’s a lot of data and no one is close to hitting any limits.
A bit of a random thought for the day but I thought you might be curious.