March 27th, 2009

Implicit Line Continuation in VB 10 (Tyler Whitney)

Things are always changing.  I was at the Washington State History Museum with my daughter a couple weeks ago.  One of the exhibits features pictures of various sites that were taken many years ago.  Then it contrasts them with contemporary pictures taken of the same locations.  It was really interesting how much things changed and how quickly.

I have a couple computers in my office that I had when I was a kid.  My wife suggested that I move my little museum to a place where it would be more appreciated—which meant out of the house 😉  So here they sit in my office.  I have an Apple ][+ that is signed by Stephen Wozniak.  He was on campus some time ago and he graciously signed it for me.  But the reason I bring it up is because every once in a while I fire it up to remember what the programming experience was like, way back when.  Having that machine in my office is a nostalgic reminder of how things change. 

And Visual Basic has certainly seen its share of change.  It’s in the context of some of my current work on the VB compiler team that I thought I’d write about a little change we are doing for Dev 10. 

What’s in a line

VB is a line-oriented language.  That is, we use the carriage return as our statement termination token.  You are no doubt familiar with other languages that use an explicit statement terminator–like C which uses the ‘;’.  A carriage return in VB is, for the purposes of analogy anyway, similar to the ‘;’ in C*.  But why have a statement terminator symbol in the first place?

One reason is ambiguity.  A common complaint about compilers is that if the compiler knows that the terminator is missing why not put it in for you rather than bother you with an error?  Part of the reason is that it isn’t necessarily clear where it should go.  The compiler has just reached the end of the road, as far as the current statement goes, and there may have been multiple places along the way where a terminator could have made sense.  If the compiler silently inserted it for you it may be right part of the time.  But it could also be wrong– silently changing the meaning of your program in ways you didn’t expect.

I tried to explain why we have statement terminator tokens in a recent Channel 9 interview.  The example I used went something like this:

                On Thursday Beth coded feature1 and feature2 and feature3 on Friday.

It’s a bad sentence.  But the issue I’ll focus on is ambiguity.  When did feature2 get written?  It could have been on Thursday.  It could have been on Friday.  We can fix the ambiguity with some punctuation.

                On Thursday Beth coded feature1 and feature2.  And feature3 on Friday.

It’s still ugly.  But punctuation at least addresses the ambiguity issue.

We have the same problem in programming languages.  For instance, what does this mean:

Return  1

+foo()

 

Does it mean Return 1, or does it mean Return 1+foo()?

We can avoid the ambiguity by introducing punctuation to mark the end of each statement, e.g:

Return 1;

+foo();

 

To terminate or not to terminate: that is the question

When you first consider the issue of allowing whitespace in a line oriented language like VB, it seems like it would boil down to letting the scanner eat all of the whitespace and be done with it.  But the problem is more complex.  One way to think about the issue is to put the same problem in a different context.

VB uses the carriage return as a statement terminator.  C# uses the semi-colon.  Attempting to make VB read through carriage returns as if they were expendable whitespace is similar to getting C# to read through semi-colons as if they didn’t always mean we are at the end of a statement.  Parsing through carriage returns in VB for this:

                Dim x as integer = 1 +

                2

 

Is roughly like trying to parse this in C#

 

                int x =  1 +;

                2;           

 

The problem is being able to tell when a statement completion token means that we are at the end of the statement vs. when it doesn’t.  We have to approach an existing grammar and decide how to provide this flexibility without creating a lot of risk for all the existing code out there that will be compiled by the new compiler.

 We decided to mitigate risk and keep the feature simple by limiting implicit line continuation to easy-to-understood cases.  We choose tokens where it would be easy to infer that an implicit line continuation could occur.  For instance, it is clear that x = 1+  isn’t a ‘finished’ statement.  So when we parse the ‘+’ we will peek through the statement terminator (the carriage return) to see if we can continue the expression.

We don’t capture every scenario.  Given our cost and time constraints around the feature, we tried to capture the most common cases that would provide the most bang for the buck.  We also avoided the ones that just led to problems.  Here are some examples of problems you could have if we had decided to allow implicit continuation anywhere.  I take these from some analysis that Lucian Wischik (also on the VB compiler team) did on our grammar:

With y

                A=x

                .xfield

End With

 

If we allowed implicit continuation before the ‘.’ we would have problems knowing what the period belongs to.  For example, it could be interpreted as:

                With y

                                A=x.xield

                End With

Or

                With y

                                A=x

                                .xfield

              &nbs p; End With

 

If we allowed whitespace after every keyword, we could run into problems where a set of tokens could be interpreted in different ways:

 

Do

                While condition

                End While

Loop

 

Could also be interpreted as:

 

                Do While condition

                End While ß this would become a syntax error

                Loop

 

Here’s some more fun with keywords and whitespace.  Given the following:

 

                Do

                Loop

Until

Foo

 

Should it be interpreted as:

                Do

                Loop Until Foo

 

Or instead as:

 

                Do

                Loop

                Until ß a method call

                Foo ß another method call

 

And finally for a (even more) contrived example:

 

Sub Main()

End

Sub

 

Is this an End statement inside a Sub Main()and the user has just started typing in a new Sub?  Or is it End Sub? Remember that we don’t just get to parse the finished text.  We have to parse as the user is entering the text in the first place so we can offer appropriate Intellisense.

 

And so it goes for other examples. 

 

There is still a place for the explicit line continuation character.  You may have occasion to use it when you want to split a line in way that implicit line continuation doesn’t accommodate.   For instance:

 

‘This works:

Dim list As New List(Of Integer) From

{1, 2, 3, 4, 5}

 

But the following doesn’t because we don’t allow a continuation before the From keyword in this context.  We need to use an explicit line continuation here:

 

Dim list As New List(Of Integer) _

From {1, 2, 3, 4, 5}

 

So sweet was ne’er so fatal – Othello Act V, Sc. II

I was asked a question on the Channel 9 interview about how this feature is tested since it seems like it could be one of these hair-pulling things to make sure we haven’t broken anything.

One help is that we have the advantage of having a good test bed.  For language specific tests alone, we have about 25,000 tests covering 1.4 million scenarios.  Our testers created a tool that can inject carriage returns into some of our existing tests after the tokens we know can imply line continuation.  Then the test is run to make sure it compiles and runs the same way it did before.  There is also the testing that is done by a tester armed with the spec and the grammar, who tries to find ways to break the compiler and the intellisense experience.  Tests are also hand-crafted to test various line-continuation scenarios.

It is gratifying to finally get implicit line continuation into the language.  There has been a desire to do it for some time, but it usually had to take a place in line behind other priorities.  But now it will see the light of day.

It’s fascinating how things have changed over the years.  Hopefully for the better.  When the Visual Studio 10 beta becomes available I hope you’ll give implicit line continuation a shot and let us know how it goes.

-Tyler

Author

0 comments