Linkai Yu, Senior Application Development Manager, discusses his experience with enterprise applications that fail after being deployed in production and how Premier Developer works with customers to resolve and prevent these types of issues.
Even though Applications Go Through Rigorous Tests
It’s true that most applications go through rigorous QA testing before being deployed into a production data center. Despite rigorous load and performance testing, a lack of representative data or failure to accurately reproduce real user behavior can lead to oversights and result in major production issues.
How many times have you run into the situation where a problem happens intermittently in production but QA is never able to reproduce it? There is often tension among teams as to the source of these problems. Developers might blame the IT team for issues with network and platform or the DBA for poorly written queries and slow database server. IT and database teams may deny anything is wrong and blame application developers. Meanwhile, users get tired of repeatedly reporting the problem, the business suffers, and issues never seem to get resolved.
When this kind of operational gridlock happens with integrated production systems, unresolved problems can stack up and make resolutions to even simple issues difficult. It is often hard to know which problem is related (or not related) to other problems. Each attempt to resolve an issue can run the risk of further compromising stability due to complexity and dependencies. If you are unfortunately in this kind of situation, Premier Developer can help you work across teams and environments to isolate and resolve problems.
How Microsoft Premier Services Can Help
While Premier Services tends to focus on proactive services that prevent production issues such as scalability labs and optimization clinics, we also provide world class support for reactive issues and critical outages when they occur.
First, let me explain how we organize our technical expertise with a multi-tiered approach so that we can be responsive, covering the broadest spectrums of technologies while still being specialized. Our first tier support professionals are the ones who have product related knowledge such as IIS configurations, IE GPOs etc. If an issue can’t be resolved with configuration changes or other first tier troubleshooting tools, a second/escalation tier will engage with deep debugging skills to track the problem in your code without having to be intimately familiar with your application. Our support organization are grouped according to products and development SDKs so, for example, a WCF web service issue can be handled by an engineer who knows how to debug .NET code with WCF framework knowledge.
In addition to debugging skills, we have many debugging tools and extensions that are specifically tailored for particular products or SDKs that know how to traverse data structures and uncover information that is otherwise difficult to dig out manually. This kind of organization combined with debugging techniques basically allows us to catch any problem given proper debugging data and traces captured in timely fashion.
As an example, we recently helped a customer resolve an issue in their application that was consistently crashing after encountering an out of memory exception. Production support was rebooting the machine every night in order for the application to clear up the memory leak. They had been doing this for many years before they came to us for help. Our support team did two things: First, they set up performance monitor counters to confirm and isolate the memory leak. Second, we used a free tool from Microsoft called DebugDiag to track the memory leak. Under normal circumstances, this tool would reveal the code related to the memory allocation. In this case, the tool caught the memory leak but did not reveal the name of the function that was responsible so deeper analysis was required. Our support team set up a debugger called WinDbg (download from Microsoft), created a break point at the call to the malloc() (memory allocation) function in the C++ runtime library, and scripted the debugger to display the callstack and log it to a text file. We handed over the log file to the application developer who found the leaking code very quickly because he knows the code well enough to recognize the callstack.
The Best Way to Add Value to Your Business is by Collaborating with Microsoft Premier
As you can see from this example, your developers know the applications well but they may not know everything in the deeper technical layers such as memory management, OS internals and debugging. It is also not always trivial to debug these problems in production environments if that is the only place they appear. Through support collaboration, your team can stay focused on their core goals and efficiently partner with Microsoft to overcome difficult support issues that impact your business. Microsoft Premier can extend deep system level knowledge that ranges from memory management, synchronization, network management, process and multi-tasking management, exception handling, security etc. Collaboration between customer application developers and Microsoft Premier is the most optimal way to quickly find and resolve issues with application performance and stability.
While traditional QA is an absolute necessity before applications can be deployed in production, problems can still occur. When they do, they will have a greater impact on business and introduce more complexity to troubleshoot. Premier Developer can help you find issues that even the best QA testing might miss, reduce service disruptions to an absolute minimum, and recommend proactive services to improve operational health and optimize your software development lifecycle.
0 comments