July 24th, 2018

Feature Flags Give Crash Reporting a Serious Boost

This is a guest post from Eyal Keren, Co-founder and CTO at Rollout.io.

Let me guess: you’re using at least one crash reporting solution to track the crashes in your iOS and Android apps. It’s an easy guess because we all do this. What I wouldn’t attempt to guess is what you do after a crash is reported because it’s team, and organization, specific.

Regardless of the specifics of your crash resolution flow, there is a technique that can assist you in investigating and mitigating the impact of crashes: remote feature flag management.

In this post, I’ll show you how feature flags can make your life easier and greatly reduce the impact of potential crashes when you release new features.

What Are Feature Flags, Exactly?

The idea behind feature flags is relatively simple, but still powerful. Developers add special configurable parameters (flags) in the source code and these parameters affect the application’s functionality (features). Thus, “feature flags.” In other words, by changing the value of a specific feature flag, you can enable or disable entire features of your application.

Feature flags are controlled remotely, and their values can be easily changed even by the non-technical staff. Therefore, feature flags effectively allow you to control application functionality in real time without going through the entire deployment flow.

Combining Feature Flags With Crash Reporting

Now that you know what feature flags are, let’s see how this technique can be combined with crash reporting to release new features with less risk.

Imagine that you want to add a new feature X in your Android application.

As a developer, you code the feature and put it behind a feature flag such that your users will see feature X only when this specific feature flag is enabled. Then you build and distribute the new version of your mobile app with this feature disabled for all users. Once the application is deployed, you can start gradually enabling feature X for segments of your users. In theory, you can enable it for all users at once, but it’s usually a good idea to start with a smaller subset. This way, you will limit the number of users affected by a potential crash.

If no issues are detected—great! You will gradually roll the feature out to all users. After a while, developers will remove the feature flag associated with feature X if it won’t be needed anymore.

If Issues Are Discovered, the Benefit Is Huge

We all hope for a smooth release, but in many cases issues crop up. Once you deploy a new release, you can start receiving new crash reports. Sometimes these crashes will manifest themselves right away, but it’s also possible that issues will become apparent only after a considerable number of users have started using feature X.

Without feature flags, the only way to mitigate this situation would be to roll the application release back, which can be cumbersome. But if you’re using feature flags, you can simply disable feature X for all affected users. This way, you can mitigate the impact of this crash almost immediately and allow for investigation, fix, and deployment of the fix to be carried out without affecting all of your users.

That’s a huge win. In fact, there are three different benefits here:

    1. 1. By gradually enabling feature X for your users, you limit the number of users affected by a crash.
    1. 2. You can quickly disable feature X for the affected users, thus reducing the impact of a crash.
    1. 3. If you can keep feature X enabled for the users not affected by a crash, you reduce its impact even further. In other words, if 10% of your users experience issues and you can roll the feature back only for them, 90% will be able to keep using it while you investigate and fix the issues.

It’s not always possible to claim the third benefit; doing so will depend on the specifics of the crash’s root cause. But the first two benefits alone more than justify the investment in feature flags for the majority of projects.

Feature Flags and Crash Reporting: A Quick Example

Now I’ll show you a practical example of a crash reporting solution augmented with feature flags management. I’ll be using Visual Studio App Center to monitor crashes in an Android application.

1. Setting up the Application to Monitor Crashes

The application itself is simple. It fetches the latest questions from the StackOverflow public API and shows them in a list:

When a specific question is clicked, its body is shown on a new screen alongside the avatar and the nickname of the user who asked the question:

Somewhere inside the code, I have included this method that enables the transition from the questions list screen to the single question details screen:

@Override
public void onQuestionClicked(Question question) {
    showDetailsOnNewScreen(question);
}

Now I would like to start monitoring the app’s analytics and crashes with App Center. To enable integration with App Center, I basically need to add just one line of code to the application’s onCreate() method:

@Override
public void onCreate() {
    super.onCreate();
    AppCenter.start(this, Constants.APP_CENTER_KEY, Analytics.class, Crashes.class);
}

So far, so good! The application works seamlessly.

2. Releasing New Features Controlled by Feature Flags

Imagine that I have this application in production and would like to test a new feature. Instead of showing question details on a new screen, I want to show them in a dialog. However, before I roll this feature to all users, I want to test it on a small subset of my users. This approach is known as A/B testing, and it’s another common use case for feature flags.

To do this, I’ll modify the aforementioned method which handles clicks on questions in a list in the following manner:

@Override
public void onQuestionClicked(Question question) {
    if (mFeatureFlagsManager.shouldShowDetailsInDialog()) {
        showDetailsInDialog(question);
    } else {
        showDetailsOnNewScreen(question);
    }
}

As you can see, depending on the value of a new feature flag, question details will be shown either on a new screen or in a dialog.

The FeatureFlagsManager object encapsulates and abstracts all the logic related to remote control of feature flags and exposes the flags for the rest of the application to consume. (The details of its internal implementation are outside the scope of this post.)

Assuming that FeatureFlagsManager and showDetailsInDialog(Question) both work as intended, I’m done. Now I can release a new version of the application.

However, before I do that, I’d like to make one additional change. I will modify Application’s onCreate() method in the following way:

@Override
public void onCreate() {
    super.onCreate();
    AppCenter.start(this, Constants.APP_CENTER_KEY, Analytics.class, Crashes.class);
    Crashes.setListener(new AbstractCrashesListener() {
        @Override
        public Iterable getErrorAttachments(ErrorReport report) {
            ErrorAttachmentLog features = ErrorAttachmentLog.attachmentWithText(
                    mFeatureFlagsManager.toString(),
                    "Features.txt");
            return Collections.singletonList(features);
        }
    });
}

The goal of this additional code is to attach information about the used feature flags to future crash reports. You’ll see why this is useful in a minute.

After implementing the feature, putting it behind the feature flag and integrating feature flag information into crash reporting, I feel confident enough to deploy the changes into production. I will keep this new feature disabled by default and perform a gradual rollout to users.

3. Investigating the Cause of a Crash

Let’s assume that at some point after I start the rollout, I see detailed crash reports with stack traces and more in my App Center crash reporting dashboard:

In the most general case, I would need to initiate a full diagnostic process because I wouldn’t be able to attribute the crashes to specific features. However, now that I’m using feature flags integrated with App Center crash service, I can perform a preliminary investigation of the crashes right from the crash reporting dashboard.

There are two users affected by this crash. So, I’ll navigate to crash details and click the attachments tab:

Here I find the list of the feature flags with their respective values at the time of the crash. That’s exactly the information that I attached to the crash report earlier. Having it here is super useful because now I can cross-correlate between crashes and features.

If all users affected by this crash have this feature flag enabled, that might indicate the crash itself was caused by this specific feature. For this simple tutorial app, there were just two affected users, but for a real app there may be more reports. The more crash reports of a specific type associated with a specific feature flag, the higher the probability that something about this feature caused the crashes.

4. Mitigating the Impact of a Crash

Now that I can relate crash reports to individual features, I can toggle the feature flag’s value and disable the feature for my users. Then I’ll have as much time as I need to properly investigate the crash and fix it.

In this case, I have just one new feature, so it’s kind of obvious that it’s the feature responsible for the crash. However, when working with a real application, I will usually deploy multiple new features on each release. Then I can have several feature flags in my app, and this ability to quickly cross-correlate between crashes to specific features will become critically important for quick mitigation of a crash’s impact.

Conclusion

Now that you know how to integrate feature flags with your App Center crash service, you can signifigantly reduce the impact of crashes. Such an integration makes investigating crashes quicker and easier and allows for almost immediate rollback of problematic features.

In this article, I concentrated on just one use case of feature flags. However, the feature flags technique has additional important use cases which provide even more benefits. The most successful enterprises and startups use this technique to reduce risks associated with feature delivery.

Go ahead and start using feature flags and App Center in your application. They are easy to integrate and can prove extremely beneficial. Eyal Keren is co-founder and CTO of Rollout.io, where his focus is software craftsmanship, continuous design and TDD. Prior to Rollout, he spent 8 years as a software engineer at Intel. He earned a degree with honors in computer software engineering from the Israel Institute of Technology, but his love of code really began back in early childhood, when he discovered a BASIC book in an IBM XT box and began copying the English letters.

 

 

Author

Feedback