The PDC and Application Compatibility, but still no Hosting
The PDC has happened, which means two things. I can post some of my (slightly self-censored) reactions to the show, and I can talk about what we’ve disclosed about Whidbey and Longhorn more freely. In this particular case, I had promised to talk about the deep changes we re making in Whidbey to allow you to host the CLR in your process. As you ll see, I got side tracked and ended up discussing Application Compatibility instead. But first, my impressions of the PDC: The first keynote, with Bill, Jim & Longhorn, was guaranteed to be good. It had all the coolness of Avalon, WinFS and Indigo, so of course it was impressive. In fact, throughout all the sessions I attended, I was surprised by the apparent polish and maturity of Longhorn. In my opinion, Avalon looked like it is the most mature and settled. Indigo also looked surprisingly real. WinFS looked good in the keynote, where it was all about the justification for the technology. But in the drill-down sessions, I had the sense that it’s not as far along as the others. Hopefully all the attendees realize that Longhorn is still a long way off. It’s hard to see from the demos, but a lot of fundamental design issues and huge missing pieces remain. Incidentally, I still can’t believe that we picked WinFX to describe the extended managed frameworks and WinFS to describe the new storage system. One of those names has got to go. I was worried that the Whidbey keynote on Tuesday would appear mundane and old-fashioned by comparison. But to an audience of developers, Eric’s keynote looked very good indeed. Visual Studio looked better than I’ve ever seen it. The device app was so easy to write that I feel I could build a FedEx-style package tracking application in a weekend. The high point of this keynote was ASP.NET. I hadn’t been paying attention to what they’ve done recently, so I was blown away by the personalization system and by the user-customizable web pages. If I had seen a site like that, I would have assumed the author spent weeks getting it to work properly. It’s hard to believe this can all be done with drag-and-drop. In V1, ASP.NET hit a home run by focusing like a laser beam on the developer experience. Everyone put so much effort into building apps, questioning why each step was necessary, and refining the process. It’s great to see that they continue to follow that same discipline. In the drill-down sessions, over and over again I saw that focus resulting in a near perfect experience for developers. There are some other teams, like Avalon, that seem to have a similar religion and are obtaining similar results. (Though Avalon desperately needs some tools support. Notepad is fine for authoring XAML in demos, but I wouldn’t want to build a real application this way). Compared to ASP.NET, some other teams at Microsoft are still living in the Stone Age. Those teams are still on a traditional cycle of building features, waiting for customers to build applications with those features, and then incorporating any feedback. Beta is way too late to find out that the programming model is clumsy. We shouldn’t be shirking our design responsibilities like this. Anyway, the 3rd keynote (from Rick Rashid & Microsoft Research) should have pulled it all together. I think the clear message should have been something like:
- Whidbey is coming next and has great developer features. After that, Longhorn will arrive and will change everything. Fortunately, Microsoft Research is looking 10+ years out, so you can be sure we will increasingly drive the whole industry.
This should have been an easy story to tell. The fact is that MSR is a world class research institution. Browse the Projects, Topics or People categories at http://research.microsoft.com and you ll see many name brand researchers like Butler Lampson and Jim Gray. You will see tremendous breadth on the areas under research, from pure math and algorithms to speech, graphics and natural language. There are even some esoterica like nanotech and quantum computing. We should have used the number of published papers and other measurements to compare MSR with other research groups in the software industry, and with major research universities. And then we should have shown some whiz-bang demos of about 2 minutes each. Unfortunately, I think instead we sent a message that Interesting technology comes from Microsoft product groups, while MSR is largely irrelevant. Yet nothing could be further from the truth. Even if I restrict consideration to the CLR, MSR has had a big impact. Generics is one of the biggest feature added to the CLR, C# or the base Frameworks in Whidbey. This feature was added to the CLR by MSR team members, who now know at least as much about our code base as we do. All the CLRs plans for significantly improved code quality and portable compilers depend on a joint venture between MSR and the compiler teams. To my knowledge, MSR has used the CLR to experiment with fun things like transparent distribution, reorganizing objects based on locality, techniques for avoiding security stack crawls, interesting approaches to concurrency, and more. SPOT (Smart Object Personal Technology) is a wonderful example of what MSR has done with the CLRs basic IL and metadata design, eventually leading to a very cool product. In my opinion, Microsoft Research strikes a great balance between long term speculative experimentation and medium term product-oriented improvements. I wish this had come across better at the PDC.
In the 6+ years I’ve been at Microsoft, we’ve had 4 PDCs. This is the first one I’ve actually attended, because I usually have overdue work items or too many bugs. (I’ve missed all 6 of our mandatory company meetings for the same reason). So I really don’t have a basis for comparison. I guess I had expected to be beaten up about all the security issues of the last year, like Slammer and Blaster. And I had expected developers to be interested in all aspects of security. Instead, the only times the topic came up in my discussions is when I raised it. However, some of my co-workers did see a distinct change in the level of interest in security. For example, Sebastian Lange and Ivan Medvedev gave a talk on managed security to an audience of 700-800. They reported a real upswing in awareness and knowledge on the part of all PDC attendees. But consider a talk I attended on Application Compatibility. At a time when most talks were overflowing into the hallways, this talk filled less than 50 seats of a 500 to 1000 seat meeting room. I know that AppCompat is critically important to IT. And it’s a source of friction for the entire industry, since everyone is reluctant to upgrade for fear of breaking something. But for most developers this is all so boring compared to the cool visual effects we can achieve with a few lines of XAML. Despite a trend to increased interest in security on the part of developers, I suspect that security remains more of an IT operations concern than it does a developer concern. And although the events of the last year or two have got more developers excited about security (including me!), I doubt that we will ever get developers excited about more mundane topics like versioning, admin or compatibility. This latter stuff is dead boring. That doesn’t mean that the industry is doomed. Instead, it means that modern applications must obtain strong versioning, compatibility and security guarantees by default rather than through deep developer involvement. Fortunately, this is entirely in keeping with our long term goals for managed code. With the first release of the CLR, the guarantees for managed applications were quite limited. We guaranteed memory safety through an accurate garbage collector, type safety through verification, binding safety through strong names, and security through CAS. (However, I think we would all agree that our current support for CAS still involves far too much developer effort and not enough automated guarantees. Our security team has some great long-term ideas for addressing this.) More importantly, we expressed programs through metadata and IL, so that we could expand the benefits of reasoning about these programs over time. And we provided metadata extensibility in the form of Custom Attributes and Custom Signature Modifiers, so that others could add to the capabilities of the managed environment without depending on the CLR team’s schedule. FxCop (http://www.gotdotnet.com/team/fxcop/) is an obvious example of how we can benefit from this ability to reason about programs. All teams developing managed code at Microsoft are religious about incorporating this tool into their build process. And since FxCop supports adding custom rules, we have added a large number of Microsoft-specific or product-specific checks.
Churn and Application Breakage
We also have some internal tools that allow us to compare different versions of assemblies so we can discover inadvertent breaking changes. Frankly, these tools are still maturing. Even in the Everett timeframe, they did a good job of blatant violations like the removal of a public method from a class or addition of a method to an interface. But they didn’t catch changes in serialization format, or changes to representation after marshaling through PInvoke or COM Interop. As a result, we shipped some unintentional breaking changes in Everett, and until recently we were on a path to do so again in Whidbey. As far as I know, these tools still don’t track changes to CAS constructs, internal dependency graphs, thread-safety expectations, exception flow (including a static replacement for the checked exceptions feature), reliability contracts, or other aspects of execution. Some of these checks will probably be added over time, perhaps by adding additional metadata to assemblies to reveal the developers intentions and to make automated validation more tractable. Other checks seem like research projects or are more appropriate for dynamic tools rather than static tools. It’s very encouraging to see teams inside and outside of Microsoft working on this. I expect that all developers will eventually have access to these or similar tools from Microsoft or 3rd parties, which can be incorporated into our build processes the way FxCop has been. Sometimes applications break when their dependencies are upgraded to new versions. The classic example of this is Win95 applications which broke when the operating system was upgraded to WinXP. Sometimes this is because the new versions have made breaking changes to APIs. But sometimes it’s because things are just different . The classic case here is where a test case runs perfectly on a developers machine, but fails intermittently in the test lab or out in the field. The difference in environment might be obvious, like a single processor box vs. an 8-way. Yet all too often it’s something truly subtle, like a DLL relocating when it misses its preferred address, or the order of DllMain notifications on a DLL_THREAD_ATTACH. In those cases, the change in environment is not the culprit. Instead, the environmental change has finally revealed an underlying bug or fragility in the application that may have been lying dormant for years. The managed environment eliminates a number of common fragilities, like the double-free of memory blocks or the use of a file handle or Event that has already been closed. But it certainly doesn’t guarantee that a multi-threaded program which appears to run correctly on a single processor will also execute without race conditions on a 32-way NUMA box. The author of the program must use techniques like code reviews, proof tools and stress testing to ensure that his code is thread-safe. The situation that worries me the most is when an application relies on accidents of current FX and CLR implementations. These dependencies can be exceedingly subtle. Here are some examples of breakage that we have encountered, listed in the random order they occur to me:
- Between V1.1 and Whidbey, the implementation of reflection has undergone a major overhaul to improve access times and memory footprint. One consequence is that the order of members returned from APIs like Type.GetMethods has changed. The old order was never documented or guaranteed, but we’ve found programs including our own tests which assumed stability here.
- Structs and classes can specify Sequential, Explicit or AutoLayout. In the case of AutoLayout, the CLR is free to place members in any order it chooses. Except for alignment packing and the way we chunk our GC references, our layout here is currently quite predictable. But in the future we hope to use access patterns to guide our layout for increased locality. Any applications that predict the layout of AutoLayout structs and classes via unsafe coding techniques are at risk if we pursue that optimization.
- Today, finalization occurs on a single Finalizer thread. For scalability and robustness reasons, this is likely to change at some point. Also, the GC already perturbs the order of finalization. For instance, a collection can cause a generation boundary to intervene between two instances that are normally allocated consecutively. Within a given process run, there will likely be some variation in finalization sequence. But for two objects that are allocated consecutively by a single thread, there is a high likelihood of predictable ordering. And we all know how easy it is to make assumptions about this sort of thing in our code.
- In an earlier blog (http://blogs.gotdotnet.com/cbrumme/PermaLink.aspx/e55664b4-6471-48b9-b360-f0fa27ab6cc0), I talked about some of the circumstances that impact when the JIT will stop reporting a reference to the GC. These include inlining decisions, register allocation, and obvious differences like X86 vs. AMD64 vs. IA64. Clearly we want the freedom to chase better code quality with JIT compilers and NGEN compilers in ways that will substantially change these factors. Just yesterday an internal team reported a GC bug on multi-processor machines only that we quickly traced to confusion over lifetime rules and bad practice in the application. One finalizable object was accessing some state in another finalizable object, in the expectation that the first object was live because it was the this argument of an active method call.
- During V1.1 Beta testing, a customer complained about an application we had broken. This application contained unmanaged code that reached back into its callers stack to retrieve a GCHandle value at an offset that had been empirically discovered. The unmanaged code then transitioned into managed and redeemed the supposed handle value for the object it referenced. This usually worked, though it was clearly dependent on filthy implementation details. Unfortunately, the System.EnterpriseServices pathways leading to the unmanaged application were somewhat variable. Under certain circumstances, the stack was not what the unmanaged code predicted. In V1, the value at the predicted spot was always a 0 and the redemption attempt failed cleanly. In V1.1, the value at that stack location was an unrelated garbage value. The consequence was a crash inside mscorwks.dll and Fail Fast termination of the process.
- In V1 and V1.1, Object.GetHashCode() can be used to obtain a hashcode for any object. However, our implementation happened to return values which tended to be small ascending integers. Furthermore, these values happened to be unique across all reachable instances that were hashed in this manner. In other words, these values were really object identifiers or OIDs. Unfortunately, this implementation was a scalability killer for server applications running on multi-processor boxes. So in Whidbey Object.GetHashCode() is now all we ever promised it would be: an integer with reasonable distribution but no uniqueness guarantees. It’s a great value for use in HashTables, but it’s sure to disappoint some existing managed applications that relied on uniqueness.
- In V1 and V1.1, all string literals are Interned as described in http://blogs.gotdotnet.com/cbrumme/PermaLink.aspx/7943b9be-cca9-41e1-8a83-3d7a0dbba270. I noted there that it is a mistake to depend on Interning across assemblies. That’s because the other assembly might start to compose a String value which it originally specified as a literal. In Whidbey, assemblies can opt-in or opt-out of our Interning behavior. This new freedom is motivated by a desire to support faster loading of assemblies (particularly assemblies that have been NGEN ed). We’ve seen some tests fail as a result.
- I’ve seen some external developers use a very fragile technique based on their examination of Rotor sources. They navigate through one of System.Threading.Thread’s private fields (DONT_USE_InternalThread) to an internal unmanaged CLR data structure that represents a running managed thread. From there, they can pluck interesting information like the Thread::ThreadState bit field. None of these data structures are part of our contract with managed applications and all of them are sure to change in future releases. The only reason the ThreadState field is at a stable offset in our internal Thread struct today is that its frequency of access merits putting it near the top of the struct for good cache-line filling behavior.
- Reflection allows highly privileged code to access private members of arbitrary types. I am aware of dozens of teams inside and outside of Microsoft which rely on this mechanism for shipping products. Some of these uses are entirely justified, like the way Serialization accesses private state that the type author marked as [Serializable()]. Many other uses rather questionable, and a few are truly heinous. Taken to the extreme, this technique converts every internal implementation detail into a publicly exposed API, with the obvious consequences for evolution and application compatibility.
- Assembly loading and type resolution can happen on very different schedules, depending on how your application is running. We’ve seen applications that misbehave based on NGEN vs. JIT, domain-neutral vs. per-domain loading, and the degree to which the JIT inlines methods. For example, one application created an AppDomain and started running code in it. That code subsequently modified the private application directory and then attempted to load an assembly from that directory. Of course, because of inlining the JIT had already attempted to load the assembly with the original application directory and had failed. The correct solution here is to disallow any changes to an AppDomain’s application directory after code starts executing inside that AppDomain. This directory should only be modifiable during the initialization of the AppDomain.
- In prior blogs, I’ve talked about unhandled exceptions and the CLRs default policy for dealing with them. That policy is quite involved and hard to defend. One aspect of it is that exceptions that escape the Finalizer thread or any ThreadPool threads are swallowed. This keeps the process running, but it often leaves the application in an inconsistent state. For example, locks may not have been released by the thread that took the exception, leading to subsequent hangs. Now that the technology for reporting process crashes via Watson dumps is maturing, we really want to change our default policy for unhandled exceptions so that we Fail Fast with a process crash and a Watson upload. However, any change to this policy will undoubtedly cause many existing applications to stop working.
- Despite the flexibility of CAS, most applications still run with Full Trust. I truly believe that this will change over time. For example, in Whidbey we will have ClickOnce permission elevation and in Longhorn we will deliver the Secure Execution Environment or SEE. Both of these features were discussed at the PDC. When we have substantial code executing in partial trust, we re going to see some unfortunate surprises. For example, consider message pumping. If a Single Threaded Apartment thread has some partial trust code on its stack when it blocks (e.g. Monitor.Enter on a contentious monitor), then we will pump messages on that thread while it is blocked. If the dispatching of a message requires a stack walk to satisfy a security Full Demand, then the partially trusted code further back on the stack may trigger a security exception. Another example is related to class constructors. As you probably know, .cctor methods execute on the first thread that needs access to a class in a particular AppDomain. If the .cctor must satisfy a security demand, the success of the .cctor now depends on the accident of what other code is active on the threads stack. Along the same lines, the .cctor method may fail if there is insufficient stack space left on the thread that happens to execute it. These are all well understood problems and we have plans for fixing them. But the fixes will necessarily change observable behavior for a class of applications.
I could fill a lot more pages with this sort of list. And our platform is still in its infancy. Anyway, one clear message from all this is that things will change and then applications will break. But can we categorize these failures and make some sense of it all? For each failure, we need to decide whether the platform or the application is at fault for each case. And then we need to identify some rules or mechanisms that can avoid these failures or mitigate them. I see four categories.
Category 1: The application explicitly screws itself
The easiest category to dispense with is the one where a developer intentionally and explicitly takes advantage of a behavior that s/he knows is guaranteed to change. A perfect example of this is #8 above. Anyone who navigates through private members to unmanaged internal data structures is setting himself up for problems in future versions. The responsibility (or irresponsibility in this case) lies with the application. In my opinion, the platform should have no obligations. But consider #5 above. It’s clearly in this same category, and yet opinions on our larger team were quite divided on whether we needed to fix the problem. I spoke to a number of people who definitely understood the incredible difficulty of keeping this application running on new versions of the CLR and EnterpriseServices. But they consistently argued that the operating system has traditionally held itself to this sort of compatibility bar, that this is one of the reasons for Windows ubiquity, and that the managed platform must similarly step up. Also, we have to be realistic here. If a customer issue like this involves one of our largest accounts, or has been escalated through a very senior executive (a surprising number seem to reach Steve Ballmer), then we re going to pull out all the stops on a fix or a temporary workaround. In many cases, our side-by-side support is an adequate and simple solution. Customers can continue to run problematic applications on their old bits, even though a new version of these bits has also been installed. For instance, the config file for an application can specify an old version of the CLR. Or binding redirects could roll back a specific assembly. But this technique falls apart if the application is actually an add-in that is dynamically loaded into a process like Internet Explorer or SQL Server. It’s unrealistic to lock back the entire managed stack inside Internet Explorer (possibly preventing newer applications that use generics or other Whidbey features from running there), just so older questionable applications can keep running. It’s possible that we could provide lock back at finer-grained scopes than the process scope in future versions of the CLR. Indeed, this is one of the areas being explored by our versioning team. Anyway, if we were under sufficient pressure I could imagine us building a one-time QFE (patch) for an important customer in this category, to help them transition to a newer version and more maintainable programming techniques. But if you aren’t a Fortune 100 company or Steve Ballmer’s brother-in-law, I personally hope we would be allowed to ignore any of your applications that are in this category.
Category 2: The platform explicitly screws the application
I would put #6, #7 and #11 above in a separate category. Here, the platform team wants to make an intentional breaking change for some valid reason like performance or reliability. In fact, #10 above is a very special case of this category. In #10, we would like to break compatibility in Whidbey so that we can provide a stronger model that can avoid subsequent compatibility breakage. It’s a paradoxical notion that we should break compatibility now so we can increase future compatibility, but the approach really is sensible. Anyway, if the platform makes a conscious decision to break compatibility to achieve some greater goal, then the platform is responsible for mitigation. At a minimum, we should provide a way for broken applications to obtain the old behavior, at least for some transition period. We have a few choices in how to do this, and we re likely to pick one based on engineering feasibility, the impact of a breakage, the likelihood of a breakage, and schedule pressure:
- Rely on side-by-side and explicit administrator intervention. In other words, the admin notices the application no longer works after a platform upgrade, so s/he authors a config file to lock the application back to the old platform bits. This approach is problematic because it requires a human being to diagnose a problem and intervene. Also, it has the problems I already mentioned with using side-by-side on processes like Internet Explorer or SQL Server.
- For some changes, it shouldn’t be necessary to lock back the entire platform stack. Indeed, for many changes the platform could simultaneously support the old and new behaviors. If we change our default policy for dealing with unhandled exceptions, we should definitely retain the old policy& at least for one release cycle.
- If we expect a significant percentage of applications to break when we make a change, we should consider an opt-in policy for that change. This eliminates the breakage and the human involvement in a fix. In the case of String Interning, we require each assembly to opt-in to the new non-intern ed behavior.
- In some cases, we’ve toyed with the idea of having the opt-in be implicit with a recompile. The logic here is that when an application is recompiled against new platform bits, it is presumably also tested against those new bits. The developer, rather than the admin, will deal with any compatibility issues that arise. We’re well set up for this, since managed assemblies contain metadata giving us the version numbers of the CLR and the dependent assemblies they were compiled against. Unfortunately, execution models like ASP.NET work against us here. As you know, ASP.NET pages are recompiled automatically by the system based on dependency changes. There is no developer available when this happens.
Before we look at the next two categories of AppCompat failure, it’s worth taking a very quick look at one of the techniques that the operating system has traditionally used to deal with these issues. Windows has an AppCompat team which has built something called a shimming engine. Consider what happened when the company tried to move consumers from Win95/Win98/WinMe over to WinXP. They discovered a large number of programs which used the GetVersion or the preferred GetVersionEx APIs in such a way that the programs refused to run on NT-based systems. In fact, WinXP did such a good job of achieving compatibility with Win9X systems that in many cases the only reason the application wouldn’t run was the version check that the program made at start up. The fix was to change GetVersion or GetVersionEx to lie about the version number of the current operating system. Of course, this lie should only be told to programs that need the lie in order to work properly. I’ve heard that this shim which lies about the operating system version is the most commonly applied shim we have. As I understand it, at process launch the shimming engine tries to match the current process against any entries in its database. This match could be based on the name, timestamp or size of the EXE, or of other files found relative to that EXE like a BMP for the splash screen in a subdirectory. The entry in the database lists any shims that should be applied to the process, like the one that lies about the version. The shimming engine typically bashes the IAT (import address table) of a DLL or EXE in the process, so that its imports are bound to the shim rather than to the normal export (e.g. Kernel32!GetVersionEx). In addition, the shimming engine has other tricks it perform less frequently, like wrapping COM objects up with intercepting proxies. It’s easy to see how this infrastructure can allow applications for Win95 to execute on WinXP. However, this approach has some drawbacks. First, it’s rather labor-intensive. Someone has to debug the application, determine which shims will fix it, and then craft some suitable matching criteria that will identify this application in the shimming database. If an appropriate shim doesn’t already exist, it must be built. In the best case, the application has some commercial significance and Microsoft has done all the testing and shimming. But if the application is a line of business application that was created in a particular companys IT department, Microsoft will never get its hands on it. I’ve heard we re now allowing sophisticated IT departments to set up their own shimming databases for their own applications but this only allows them to apply existing shims to their applications. And from my skewed point of view the worst part of all this is that it really won’t work for managed applications. For managed apps, binding is achieved through strong names, Fusion and the CLR loader. Binding is practically never achieved through DLL imports. So it’s instructive to look at some of the techniques the operating system has traditionally used. But those techniques don’t necessarily apply directly to our new problems. Anyway, back to our categories&
Category 3: The application accidentally screws itself
Category 4: The platform accidentally screws the application
Frankly, I m having trouble distinguishing these two cases. They are clearly distinct categories, but its a judgment call where to draw the line. The common theme here is that the platform has accidentally exposed some consistent behavior which is not actually a guaranteed contract. The application implicitly acquires a dependency on this consistent behavior, and is broken when the consistency is later lost. In the nirvana of some future fully managed execution environment, the platform and tools would never expose consistent behavior unless it was part of a guarantee. Let’s look at some examples and see how practical this is. In example #1 above, reflection used to deliver members in a stable order. In Whidbey, that order changes. In hindsight, there’s a simple solution here. V1 of the product could have contained a testing mode that randomized the returned order. This would have exposed the developer to our actual guarantees, rather than to a stronger accidental consistency. Within the CLR, we’ve used this sort of technique to force us down code paths that otherwise wouldn’t be exercised. For example, developers on the CLR team all use NT-based (Unicode) systems and avoid Win9X (Ansi) systems. So our Win9X Ansi/Unicode wrappers wouldn’t typically get tested by developers. To address this, our checked/debug CLR build originally considered the day of the week and used Ansi code paths every other day. But imagine chasing a bug at 11:55 PM. When the bug magically disappears on your next run at 1:03 AM the next morning, you are far too frazzled to think clearly about the reason. Today, we tend to use low order bits in the size of an image like mscorwks.dll or the assembly being tested, so our randomization is now more friendly to testing. In example #2 above, you could imagine a similar perturbation on our AutoLayout algorithms when executing a debug version of an application, or when launched from inside a tool like Visual Studio. For example #4, the CLR already has internal stress modes that force different and aggressive GC schedules. These can guarantee compaction to increase the likelihood of detecting stale references. They can perform extensive checks of the integrity of the heap, to ensure that the write barrier and other mechanisms are effective. And they can ensure that every instruction of JITted managed code that can synchronize with the GC will synchronize with the GC. I suspect that these modes would do a partial job of eradicating assumptions about lifetimes reported by the JIT. However, we will remain exposed to significantly different code generators (like Rotors FJIT) or execution on significantly different architectures (like CPUs with dramatically more registers). In contrast with the above difficulty, it’s easy to imagine adding a new GC stress mode that perturbs the finalization queues, to uncover any hidden assumptions about finalization order. This would address example #3.
Customer Debug Probes, AppVerifier and other tools
It turns out that the CLR already has a partial mechanism for enabling perturbation during testing and removing it on deployed applications. This mechanism is the Customer Debug Probes feature that we shipped in V1.1. Adam Nathan’s excellent blog site has a series of articles on CDPs, which are collected together at http://blogs.gotdotnet.com/anathan/CategoryView.aspx/Debugging. The original goal of CDPs was to counteract the black box nature of debugging certain failures of managed applications, like corruptions of the GC heap or crashes due to incorrect marshaling directives. These probes can automatically diagnose common application errors, like failing to keep a marshaled delegate rooted so it won’t be collected. This approach is so much easier than wading through dynamically generated code without symbols, because we tell you exactly where your bugs are. But we re now realizing that we can also use CDPs to increase the future compatibility of managed applications if we can perturb current behavior that is likely to change in the future. Unfortunately, example #6 from above reveals a major drawback with the technique of perturbation. When we built the original implementation of Object.GetHashCode, we simply never considered the difference between what we wanted to guarantee (hashing) and what we actually delivered (OIDs). In hindsight, it is obvious. But I m not convinced that we aren’t falling into similar traps in our new features. We might be a little smarter than we were five years ago, but only a little. Example #10 worries me for similar reasons. I just don’t think we were smart enough to predict that changing the binding configuration of an AppDomain after starting to execute code in that AppDomain would be so fragile. When a developer delivers a feature, s/he needs to consider security, thread-safety, programming model, key invariants of the code base like GC reporting, correctness, and so many other aspects. It would be amazing if a developer consistently nailed each of these aspects for every new feature. We’re kidding ourselves if we think that evolution and unintentional implicit contracts will get adequate developer attention on every new feature. Even if we had perfect foresight and sufficient resources to add perturbation for all operations, we would still have a major problem. We can’t necessarily rely on 3rd party developers to test their applications with perturbation enabled. Consider the unmanaged AppVerifier experience. The operating system has traditionally offered a dynamic testing tool called AppVerifier which can diagnose many common unmanaged application bugs. For example, thanks to uploads of Watson process dumps from the field, most unmanaged application crashes can now be attributed to incorrect usage of dynamically allocated memory. Yet AppVerifier can use techniques like placing each allocation in its own page or leaving pages unmapped after release, to deterministically catch overruns, double frees, and reads or writes of freed memory. In other words, there is hard evidence that if every unmanaged application had just used the memory checking support of AppVerifier, then two out of every three application crashes would be eliminated. Clearly this didn’t happen. Of course, AppVerifier can diagnose far more than just memory problems. And it’s very easy and convenient to use. Since testing with AppVerifier is part of the Windows Logo compliance program, you would expect that it’s used fairly rigorously by ISVs. And, given its utility, you would expect that most IT organizations would use this tool for their internal applications. Unfortunately, this isn’t the case. Many applications submitted for the Windows Logo actually fail to launch under AppVerifier. In other words, they violate at least one of the rules before they finish initializing. The Windows AppCompat team recognizes that proactive tools like AppVerifier are so much better than reactive mitigation like shimming broken applications out in the field. That’s why they made the AppVerifier tool a major focus of their poorly attended Application Compatibility talk that I sat in on at the PDC. (Aha! I really was going somewhere with all this.) There’s got to be a reason why developers don’t use such a valuable tool. In my opinion, the reason is that AppVerifier is not integrated into Visual Studio. If the Debug Properties in VS allowed you to enable AppVerifier and CDP checks, we would have much better uptake. And if an integrated project system and test system could monitor code coverage numbers, and suggest particular test runs with particular probes enabled, we would be approaching nirvana.
Looking at development within Microsoft, one trend is very clear: Automated tools and processes are a wonderful supplement for human developers. Whether we re talking about security, reliability, performance, application compatibility or any other measure of software quality, we re now seeing that static and dynamic analysis tools can give us guarantees that we will never obtain from human beings. Bill Gates touched on this during his PDC keynote, when he described our new tools for statically verifying device driver correctness, for some definition of correctness. This trend was very clear to me during the weeks I spent on the DCOM / RPCSS security fire drill. I spent days looking at some clever marshaling code, eventually satisfying myself that it worked perfectly. Then someone else wrote an automated attacker and discovered real flaws in just a few hours. Other architects and senior developers scrutinized different sections of the code. Then some researchers from MSR who are focused on automatic program validation ran their latest tools over the same code and gave us step-by-step execution models that led up to crashes. Towards the end of the fire drill, a virtuous cycle was established. The code reviewers noticed new categories of vulnerabilities. Then the researchers tried to evolve their tools to detect those vulnerabilities. Aspects of this process were very raw, so the tools sometimes produced a great deal of noise in the form of false positives. But it’s clear that we were getting real value from Day One and the future potential here is enormous. One question that always comes up, when we talk about adding significant value to Visual Studio through additional tools, is whether Microsoft should give away these tools. It’s a contentious issue, and I find myself going backwards and forwards on it. One school of thought says that we should give away tools to promote the platform and improve all the programs in the Windows ecology. In the case of tools that make our customers applications more secure or more resilient to future changes in the platform, this is a compelling argument. Another school of thought says that Visual Studio is a profit center like any other part of the company, and it needs the freedom to charge what the market will bear. Given that my job is building a platform, you might expect me to favor giving away Visual Studio. But I actually think the profit motive is a powerful mechanism for making our tools competitive. If Visual Studio doesn’t have P&L responsibility, their offering will deteriorate over time. The best way to know whether they’ve done all they can to make the best tools possible, is to measure how much their customers are willing to pay. I want Borland to compete with Microsoft on building the best tools at the best price, and I want to be able to measure the results of that competition through revenue and market penetration. In all this, I have avoided really talking about the issues of versioning. Of course, versioning and application compatibility are enormously intertwined. Applications break for many reasons, but the typical reason is that one component is now binding to a new version of another component. We have a whole team of architects, gathered from around the company, who have been meeting regularly for about a year to grapple with the problems of a complete managed versioning story. Unlike managed AppCompat, the intellectual investment in managed versioning has been enormous. Anyway, Application Compatibility remains a relatively contentious subject over here. There’s no question that it’s a hugely important topic which will have a big impact on the longevity of our platform. But we are still trying to develop techniques for achieving compatibility that will be more successful than what Windows has done in the past, without limiting our ability to innovate on what is still a very young execution engine and set of frameworks. I have deliberately avoided talking about what some of those techniques might be, in part because our story remains incomplete. Also, we won’t realize how badly AppCompat will bite us until we can see a lot of deployed applications that are breaking as we upgrade the platform. At that point, it’s easier to justify throwing more resources at the problem. But by then the genie is out of the bottle& the deployed applications will already depend on brittle accidents of implementation, so recovery will be painfully breaking. In a world where we are always under intense resource and schedule pressure, the needs of AppCompat must be balanced against performance, security, developer productivity, reliability, innovation and all the other must haves . You know, I really do want to talk about Hosting. It is a truly fascinating subject. I’m much more comfortable talking about non-preemptive fiber scheduling than I am talking about uninteresting topics like implicit contracts and compatibility trends. But Hosting is going to have to wait at least a few more weeks.