Tips and Tricks to debug large applications
Hi, I am Vaishnavi Sannidhanam, test lead on the Phoenix compiler backend team. Phoenix is a platform and analysis framework. For more information on Phoenix you can listen to the talk given by one of our architects, Andy Ayers, on Channel 9 https://channel9.msdn.com/Showpost.aspx?postid=396461.
Our compiler test cases include compiling large applications like Windows, SQL, Office, Visual Studio and etc. At times, things don’t go as expected and we hit bugs either in the compiler or in the product we are compiling. In cases like these, debugging the plain old way by attaching a debugger is very difficult since we don’t know most of the code base we are compiling and hence can’t step through the logical flow. I wanted to share with you, two techniques that we use in countering these challenges.
Runtime Error Checks
Why is this technique useful?
This technique is very useful to find possible runtime problems at compile time. This will be useful to have turned on even if there are no odd failures you are trying to debug.
Compile your code with RTC run time error checks. This can help you find places where there is unintended data loss, finds usage of uninitialized variables, detects overruns and underruns, stack pointer corruption and stack corruption.
You can find more information about this at http://msdn2.microsoft.com/en-us/library/8wtf2dfz(VS.80).aspx
This technique assumes that you are not using /RTC with optimized (release) builds since /RTC cannot work with these builds.
Why is this technique useful?
This technique comes in very handy when an automated test is failing. You can automate the search process to identify and zone in on the bad bits, thus saving time from developers and testers.
1. Isolate to binary: Perform a binary search by mixing binaries (dlls) between last known good build and bad build and rerun the test. Keep repeating till you have isolated the broken binary.
2. Isolate to obj: Once the broken binary has been identified, compile the various objs without LTCG (Link Time Code Generation or without the /GL switch) and then mix and match the objs between the good and the bad builds to find the obj file that is causing the problem.
3. Zone in on bug: Now that you know the faulty obj, if the bug you are trying to zone in on happens in debug builds, then you can diff the sources of the working obj’s version against the broken one.
On the other hand, if this happens only when optimizations are turned on i.e. the bug exists only in retail builds, there can be three causes—turning on optimizations is causing bad code gen (compiler issue), an application issue is being exposed by turning on optimization (like buffer overrun or uninitialized variables – application bug) , or that the application is making an assumption that is faulty in the presence of optimizations and it is perfectly legal for the compiler to make that transformation (application bug).
A few examples of application bugs that we commonly see that get exposed in the presence of optimizations are
i = i++ + i++;
The value of ‘i’ after the statement could be 2, 3 or 4 and all of them are valid because the behavior of such an expression is undefined.
*p = foo(p)
If foo changes p in the above case, the behavior is again undefined (as LHS could be evaluated before we make a call to foo or vice-versa) and it could vary between optimized and unoptimized builds.
p < p + i
As per C standards this could be optimized away to 0 < i. However, if the user code is doing an overflow check using this then you could see different behavior in optimized vs unoptimized builds
Missing the volatile qualifier in multi-threading apps is another example of a bug happening only in optimized builds and not in unoptimized ones. This is because the compiler will not maintain ordering of references to non volatile objects in optimized builds. (http://msdn2.microsoft.com/en-us/library/12a04hfd(VS.80).aspx)
To help debug through this code, it is very helpful to binary search across functions in that obj by turning off all optimizations for a function/set of functions by using #pragma optimize(“”, off) and rerunning the test. Once you have identified what function is causing the problem, you can examine the diff of the source code for that function between the good and the bad versions. More info can be found at http://msdn2.microsoft.com/en-us/library/chh3fb0k(VS.80).aspx
This technique assumes that you have a good build that you can use as a reference to perform the binary search against the bad build. There also should not be any API changes between the good build and the broken build.
I hope you enjoyed this post and that it helps you with your debugging efforts.
Visual C++ Team