Hidden Costs (Matt Gertz)

Anthony D. Green [MSFT]

October 6th, 20090 0

(Note: there was a grievous error in this post based on a bad assumption on my art regarding VB. Not feeling the need to hide my ignorance :-), I have instead made a number of edits in this post to bring it back to some semblance of reality.)

One thing that gets me annoyed with myself is realizing that the product or service I’ve just bought has some hidden costs that I didn’t anticipate. It might be as complicated as realizing that my plane ticket has all sorts of byzantine surcharges and luggage costs, or as simple as making budgeting for a new killer gaming PC and forgetting about the sales tax – it seems that there are always extra costs out there that will come back to bite you. Being (as I have said in past postings to this blog) an inherently lazy person, and being fully caught up in our “I must have the sparkly thing right now” culture, I’m apt to overlook the fine print in any purchase agreement unless I really force myself to slow down and pay attention to the details.

One of the times I really have to pay attention is when I’m reviewing code here at Microsoft. As a member of the release team that makes the final decisions as to what last-minute fixes are safe enough to be applied to an imminent release, it’s my job to take a look at the proposed fix and make sure that it doesn’t introduce more problems than it purports to solve – i.e., that it doesn’t add hidden costs.

On the rare occasion that I find a problem in someone’s code, it’s usually something that’s obviously dysfunctional – a case where a variable isn’t initialized properly, or where a different team has simultaneously made a change that will render this change dangerous. But some of the problems are more subtle, and might not be obvious if only minimal testing and review has been done. Consider the following code (very similar to a case I reviewed here at work the other day):

Public MyList As New List(Of String)

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

Dim s As New System.Text.StringBuilder

For i As Integer = 0 To MyList.Count – 1

s.Append(MyList(i))

End Sub

Edit: This is an error in my blog post – I hadn’t realized that VB, unlike many other languages, only gets the value of the stop condition once (thanks to Phil Wells for pointing this out). To make my point, I would have to be using code like this:

Dim s As New System.Text.StringBuilder

Dim i As Integer = 0

Do While i < MyList.Count() – 1

s.Append(MyList(i))

i += 1

Loop

The method Count() might still have a hidden cost, but in the For Loop case, I’m at least only hitting it once. The fact that VB only analyzes the stop condition once leads to a different issue to be aware of that I’ll cover in my next blog.

Seems pretty simple, yes? When a button is clicked, the code iterates through the strings in a list and appends them to a StringBuilder, creating (for whatever reason) one long string containing all of the contents. And, to cursory appearances, it will function correctly.

However, the code has hidden costs. You see, written this way, the value MyList.Count is going to be reevaluated on every iteration of the “For” loop, even though the value itself isn’t changing. Furthermore, my indexing into MyList, which is always with reference to the beginning of the list, will also have a non-zero impact. This impacts the performance of the code and will make my customers unhappy over the sloth-like operation of my application.

In this case, the cost isn’t really high, as Count() is a property wrapping something that’s stored as a private integer member, and index referencing also has tricks to speed it up, but there’s still a cost and it can add up. It’s even worse if the Count() call has to be calculated! Consider this really awful code where I’ve created my own “List” type:

Public Class MyLameObject

‘ For some unknown reason, I’m creating my own linked list instead of using a List(Of) generic.

‘ Please don’t do this for real!

Public data As String

Public NextMLO As MyLameObject

Public Sub Insert(ByVal mlo As MyLameObject)

If NextMLO IsNot Not hing Then

NextMLO = mlo

Else

mlo.NextMLO = NextMLO.NextMLO

NextMLO.NextMLO = mlo

End If

End Sub

Public Function Count() As Integer

Dim ct As Integer = 0

Dim current As MyLameObject = Me

Do While current IsNot Nothing

ct += 1

current = current.NextMLO

Loop

Return ct

End Function

Public Function Item(ByVal index As Integer) As MyLameObject

If index >= Count() OrElse index < 0 Then Return Nothing

Dim current As MyLameObject = Me

For i As Integer = 0 To index – 1

current = current.NextMLO

Return current

End Function

End Class

Public MyLameObjectHeader As New MyLameObject

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) _

Handles Button1.Click

Dim s As New System.Text.StringBuilder

For i As Integer = 0 To MyLameObjectHeader.Count – 1

s.Append(MyLameObjectHeader.Item(i))

End Sub

Perhaps as a school assignment, I’ve created a singly-linked list which contains objects of type “MyLameObject” (instead of using optimized collections that we provide). I’ve also created a member function called “Count” which returns the number of objects in my list (starting from the current node) and another member function called “Item” which returns me a reference to a specific object in the list, as indexed from the current node. The code will compile and it will “work.” But, oh! It has so many hidden costs, and if the definition of MyLameObject exists in an assembly for which I don’t have the source code, I might not even realize it!

First, let’s look at button click handler. It’s calling Count() on every single iteration, and there will be a number of iterations equal to the value returned from Count(). Count(), in turn, walks the entire list every time it’s called. So, that means that if my list contains 1000 items, that means that I’ll be referencing a MyLameObject one million times! (This is what they refer to as an “O(n²) problem” in computer science classes.)

But wait, it gets worse: ~~the code for Item() calls Count() also on each iteration to verify that the index is not out of bounds – that brings us up to a count of 1001000 MyLameObject references~~.

Edit: As noted above, that isn’t true for VB, though it would be for many other languages

And, of course, Item() itself needs to walk the list to actual get to the desired item, on every iteration – that adds 500500 more object hits (using the n(n +1)/2 formula) , for a grand total of ~~1501500~~ 1500500 object hits. And all that just to concatenate 1000 strings! That, I fear, is going to slow down my code’s performance somewhat.

Now, of course I could add code to get track of values for Count(), and that would reduce my cost to a third of what it was. I could use some clever hashing algorithm to more easily get to the items while still being able to use method calls. I could enumerate the list myself instead of relying on “Count” or on “Item,” and that would really cut my costs down dramatically. Or, best of all, I could avoid reinventing the wheel and using the classes that VB.NET already supplies for me, since we do our best to optimize them.

But that’s not the point that I’m trying to make here.