October 7th, 2009

Getting Loopy (Matt Gertz)

In my last post, I talked about the hidden costs that can occur whenever you call out to methods, particularly in loops.  In looking at my examples, reader KG2V commented that another thing that folks need to be aware of is avoiding the assumption that the world (or, in this case, a list) is a static thing.  It’s a good point and it deserves some attention, particularly given that different languages will react in different ways to change.

To illustrate this point, let’s assume that we have a form with a ListBox on it, and that we also have a button on the form that, when clicked, runs the following VB code:

    ‘ Bad — don’t do this

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

        For i As Integer = 0 To ListBox1.Items.Count – 1

            ListBox1.Items.RemoveAt(i)

        Next

    End Sub

 

Ostensibly, we want to remove all of the items from the ListBox.  (Yes, normally we’d use Clear() to do that, but stay with me for a moment;  I’ll switch to more “real” cases in a few paragraphs.)  But if we run this code, we’ll get an exception.  Why?  Because whenever we remove an object, all of the indices of the objects after it will decrease by 1.  Let’s assume that there are three objects in the list –  A, B, and C – so that Count=3:

·         When i = 0, we’ll remove the object at index 0, which is A.  B now moves to index 0, and C now moves to index 1.

·         When i = 1, we’ll remove the object at index 1, which is C.  There is nothing after C, so nothing else changes.

·         When i =2, we’ll remove the object at index 2, which is… nothing.  There’s only one object left, which is B, and it’s at index 0.  So, we get an out-of-bounds exception.

In VB (as was pointed out to me in the comments section of my last point), the value of Count is only assessed once, right when the loop is encountered, so we will always loop through (in this case) three times.

But is the lack of reevaluation of Count the only issue?  Maybe not.  Let’s try the same method in C#, in which Count is evaluated on every iteration:

        // Bad — don’t do this.

        private void button1_Click(object sender, EventArgs e)

        {

            int i;

            for (i = 0; i < listBox1.Items.Count; i++)

            {

                listBox1.Items.RemoveAt(i);

            }

        }

 

·         When i = 0, we’ll remove the object at index 0, which is A.  B now moves to index 0, and C now moves to index 1.

o   Now, we’ll loop back to the top.  The value of i becomes 1, and the value of Count becomes 2.  The stop condition is not met, so we keep going.

·         When i = 1, we’ll remove the object at index 1, which is C.  There is nothing after C, so nothing else changes.

o   Again, we loop back to the top.  The value of i becomes 2, and the value of Count becomes 1.  The stop condition is met, and we exit the loop.

So, although no exception is thrown, the method also fails in C# — B is left behind.  So, the problem isn’t whether or not Count is evaluated; it’s that the elements in the list are changing indices on each deletion.

Sometimes doing things backwards is good

This is why, in the case where multiple removals need to be made on an indexed list, a smart programmer (all of whom are yawning right now as they read this) will work backwards.  That is, they start with the largest index and work down towards zero.  For example, imagine that we want to remove the 4th, 7th, and 9th element from a list.  We don’t want to do it this way:

        ‘ Bad — don’t do this

ListBox1.Items.RemoveAt(4)

ListBox1.Items.RemoveAt(7)

ListBox1.Items.RemoveAt(9)

 

Once we’ve removed the 4th item, then the element at 7 will have moved to 6 and the element at 9 will have moved to 8.  After we’ve removed the new 7th item, the element at 8 will move to 7, and so on.  We’d either be removing the wrong elements or we’d crash.  It’s far better to do them in reverse order:

ListBox1.Items.RemoveAt(9)

ListBox1.Items.RemoveAt(7)

ListBox1.Items.RemoveAt(4)

 

None of the indices up front will be affected by changes that are occurring behind them in the list – their indices won’t change, and this will work fine.

This issue comes up when you are (for example) deleting items from a listbox that supports multiselection.  You basically have two ways to get the selected items in a listbox: you can use the SelectedIndices member function, which returns a collection of indices, or you can use SelectedItems, which returns a collection of selected items.  Given either collection, there’s no way to say “remove all of these.” There is a “Remove” command on the resulting collections, but if you use it, you’ll just be removing the item from *that* collection and not from the ListBox, which has its own collection of items – essentially, you’ll just deselect the items. 

So, what do you do?  At first, it looks like you have two options; you could try removing them from the master list based on item, or based on object.  Let’s look at removal based on the item:

        ‘ Bad — don’t do this

        For Each index As Integer In ListBox1.SelectedIndices

            ListBox1.Items.RemoveAt(index)

        Next

 

This won’t work.  The selected indices are in numerical order (no matter in which order you selected them), so you’ll be sliding indices on each deletion and will subsequently delete the wrong things.  You won’t crash if you overreach on an index, because the For…Each loop will check for Nothing when it does a MoveNext internally and will exit gracefully, but this is actually worse since it may take longer before you notice a problem!

This won’t work either, and will potentially crash:

        ‘ Bad — don’t do this

        For index As Integer = 0 To ListBox1.SelectedIndices.Count – 1

            ListBox1.Items.RemoveAt(ListBox1.SelectedIndices(index))

        Next

 

Again, the indices for the downstream objects will decrease, you’ll delete the wrong things, and you’ll likely crash. 

You can, however, iterate down instead, as long as you are confident that the selected indices are in numerical order (as they will be in this case):

        ‘ This works:

        For index As Integer = ListBox1.SelectedIndices.Count – 1 To 0 Step -1

            ListBox1.Items.RemoveAt(ListBox1.SelectedIndices(index))

        Next

 

Having found something that works via iteration, let’s now consider the enumeration angle instead.  In general, I prefer to enumerate lists whenever possible instead of iterating through them.  This is not one of those times, though.  Consider this code, which will fail:

        ‘ Bad — don’t do this

        For Each item In ListBox1.SelectedItems

            ListBox1.Items.Remove(item)

        Next

 

“Well, c’mon, why doesn’t this work?” I hear you cry.  “I don’t care about indices in this case, I’m not even using them; I’m just removing specific items.”  True, but if you run this code, you will indeed get an exception essentially stating that the collection has been modified and the list of selected items is no longer valid for enumeration.  We can’t work around this by assigning the collection of selected items to another variable and enumerating on that, since that just creates a reference to the same collection.  In order for it to work, you’ve have to get the references out of the collection itself so that the ListBox is no longer part of the equation on the removal loop’s assessment.

        ‘ Really, this is just silly — don’t ever do this

        Dim selItems As New Collection

        For Each item In ListBox1.SelectedItems

            selItems.Add(item)

        Next

        For Each item In selItems

            ListBox1.Items.Remove(item)

        Next

 

Functionally, this will work; however, you shouldn’t do it for two reasons:

·         First of all, it’s stupid – you have to loop through twice.  Occasionally, I do need to clone lists, like when creating an editable list from a read-only list – I’ve done that in a previous blogpost on this site, in fact.   But my motives are not so noble here; I’m using this for no other reason than allowing enumeration to work.

·         Second, each call to RemoveItem has a hidden cost – namely, the list has got to go find that item.  In this case, you can bet that there’s good hashing going on that doesn’t make that as painful as it might be, but do you really want to count on that, given the better numeric alternative above?

No, just there’s really no good way to do removal by enumeration that I’ve ever found.  And since each object is totally clueless about whether it itself is selected or not, you can’t even enumerate through the collection yourself and delete it if it’s marked as selected.

Of course, I’ve just talked about deletions, since they are more interesting than additions due to the crash potential.  However, additions can also mess you up in a loop due to similar index changes, and I’ve actually run into cases where a list may have both insertions and deletions done, so that I wasn’t even able to leverage the change in list size as a clue to figure out what had happened to the data.  The bottom line is, whenever doing an operation within a loop, you need to understand whether or not that operation will collide with your loop conditions and assumptions.  Generally, you control this in your code, but in multithreaded environments, where changes to an existing store of data might be made from a different thread “simultaneously” with your thread, you really need to understand how thread-safe your code is with respect to data changes.  (But that is a story for another time.)

‘Til next time,

  –Ma

0 comments