Guest Blog | A Mobile DevOps Retrospective, Part III: Measurement, the Last Mile
This is Part III in a special guest series from Greg Shackles, Principal Engineer at Olo. New to the series? Start here.
In my last post, I shared how automation, one of DevOps’ key concepts, frees our team to solve real problems, and how it’s created a cross-team “culture of ownership.” Everyone—from our developers to our sales managers—is laser-focused on quality.
Measurement is a big piece of the DevOps puzzle, central to continuous development and improvement. But, measurement is an ambiguous term.
To help you get started, I’ll walk through how (and why) we (my team at Olo, where we’re developing apps for 150+ restaurant brands) measure our apps to learn, iterate, and improve.
We use various tools, services, and technologies in our development process, and Visual Studio App Center gives us the freedom to choose which services work best for us (Test and Distribute) and continue to use third party tools for other parts of our CI/CD pipeline. This post walks through some best practices around measurement, which you can easily apply to any service you might be using, including App Center.
Level-Setting: What is Measurement?
In conversations with other developers, I often hear measurement and monitoring used synonymously with creating alerts for when things go wrong. That’s not quite right.
Measurement is about asking questions and establishing the necessary feedback loops to answer them. It’s using actual data to challenge (or validate) your assumptions.
At the bare minimum, all developers should be able to answer fundamental questions about their apps and users, such as:
- Are people using your app?
- Are they being successful or running into issues?
- How long does it take to complete your app’s critical flows?
- Is anyone using your latest features, or did you miss the mark?
- Do people like your app?
- Are the answers to these getting better or worse across versions of the app?
What you learn from these foundational questions guides your next round of questions. You don’t—and won’t—need to know all the “right” questions or things to track immediately, and it’s easier to get started than you might think.
Separating Signals from Noise: What We Measure
While what you measure is unique to your development process, your apps, and your goals, we’ve established three feedback channels applicable to any developer: error reporting, analytics, and app store reviews.
You need error and crash reporting, full stop. When you “measure” your error and crash frequency, you:
- Guarantee your users return: As a consumer, you likely don’t tolerate subpar mobile experiences, whether it’s faulty UI, confusing menus, or, the least forgivable of all, crashes and failure to load. You’ve heard the stats; only 16% of users give failing apps the benefit of the doubt, and, anecdotally, we see that only a small fraction of users report errors. You can’t (and shouldn’t) rely on your users to report issues; you need to proactively monitor, catch, and mitigate as many problems as possible.
- Save money (figurative and literal): Errors are always expensive, costing you some combination of time, money, and reputation. This is even more true in mobile development where even if you identify and fix a production issue quickly, you still need to package and ship an update to the app stores, wait for approval, and hope your users update (or, spend time and resources promoting the new release).
At Olo, we have dozens of apps in production at all times, so knowing about issues ASAP is of utmost importance.
Beyond the standard error report details, we automatically tag every error from our apps with different dimensions to help us quickly understand a given error’s scope and severity. For each error we report, we:
- Tag it with the Git commit used to build the app, and which version of our white-label app platform the app uses.
- Indicate if we handled the error gracefully (by presenting the user with some sort of error message), or if the error fully crashed the app.
We search and report on these tags to get deeper insights into the types of problems we’ve having and what’s causing them, tracking trends over time to make sure we’re improving from release to release.
Our error reports include tags for app name, version of our platform, and commit details.
After setting up reliable error reporting, you move into the “fun” part: understanding how users engage with your app and validating or disproving your assumptions. This is where those “bare minimum” questions that any developer should be able to quickly answer about any production apps come into play.
We’re an e-commerce app, so we want to make it fast*, easy, and convenient to select items and make purchases. Our baseline metrics, the ones that get to the heart of if we’re meeting our goals and helping our users, include how long it takes for a user to complete an order and how frequently users start an order and successfully checkout. For a productivity app, you might want to monitor how long users spend in your app, the number of times users accesses your app each day, or how much they content create.
At Olo, we also want to know how long users spend waiting for our app to execute actions. We know how quickly our servers respond, but that’s only part of the equation; it doesn’t take any mobile network latency, connectivity issues, screen load time, etc. into account. How long are users actually waiting for menus to load or to add something to their cart? The last thing we want to do is leave our users staring at loading spinners.
We add custom instrumentation into all of our apps to collect this aggregate data, tracking timings for every network request, database call, view model load, and more. We tag each metric by the brand, specific network or database call, and platform (iOS or Android), and we push this data into Datadog.
From there, we slice and dice information and easily visualize exactly how well our apps behave on real devices and real networks, rather than make assumptions based on ideal testing circumstances. Seeing our server-side metrics alongside our app metrics gives us a complete picture of our platform in one place.
In addition to lower level measurements, tracking higher level metrics, like user demographics or most popular devices, is equally important and extremely powerful. App Center makes this a breeze, giving you a full view into your users’ experience, allowing you to track sessions and custom events, segmented by user, operating system, device type, and more.
*Note: “fast” is a misnomer, as things can (almost) always be faster, but measuring gives us a starting point for making and tracking improvements.
You’ll find our real-time analytics streaming on displays across our HQ. The first thing visitors see when entering Olo is a map showing orders being placed by our users across all brands as they happen in real time.
This constant visibility holds us accountable for delivering the best possible user experience, and we’ve extended this responsibility to our entire team. Measurement isn’t just alerts for when things go wrong; they’re a way to give everyone visibility into our system doing its job, as well as an opportunity to celebrate our successes. After a long day or week, watching our users (successfully!) place orders around the country goes a long way towards bringing our team together.
Our live order map on display at Olo HQ.
For more information, and step-by-step details about how we instrument our apps and how to apply the same approach to yours, check out my Instrumenting Your Mobile Monitoring Strategy session from Xamarin Evolve 2016.
App Store Reviews
User reviews can’t be your only feedback channel, but you definitely can’t, and shouldn’t, ignore them. Reviews are a direct line to your users, but, as any developer knows, users are far more likely to leave negative reviews than positive ones, and it takes a long time to get enough positive reviews to offset a few bad reviews.
Trust me, I’ve been there. We often inherit our customers’ prior apps and, while we rebuild them, the old reviews stick around.
In the scenario below, we took over a project that was getting pummeled with bad reviews: unresponsive, prone to crashes, and generally just hard to use; you name it, we saw it.
Our client’s app reviews before and after fixing critical issues: better quality, fewer overall reviews.
After our release, the negative reviews largely disappeared, but the overall volume of reviews shrunk along with it.
Our analytics showed increased user engagement and more sessions, but a clear drop in review frequency. Most users aren’t motivated to leave reviews when apps behave as expected.
Bottom line: don’t give your users a reason to leave a bad review. You’ll end up spending quite a while devoting energy to winning back a star.
Prompting your users to leave reviews is a tempting solution, but, if you don’t do it thoughtfully, you risk aggravating your users even more.
- Don’t: ask users to leave a review when they’re trying to complete an action in your app. For us, this is when users launch the app, search for locations, and view menus. Basically, at any point before they complete their purchase.
- Do: find opportunities to nudge users to review without interrupting them. Even better, identify when they’re most likely to be happy with their experience and feeling great about your app. Experiment and test different options, and track what works best for you. For us, this means a simple prompt after users successfully place their order and while they wait for it to be delivered or ready for pick-up.
We’re extremely strategic about when, why, and how we ask for reviews, because we know that users rate the full end-to-end experience, not just the app. For example, if a customer has a particularly good or bad experience with their delivery driver or in-store pick-up, this will manifest in their reviews. While many of these “real world” components are often out of our control, we can control our users’ in-app experience.
Ensuring you deliver the best in-app experience is critical as a developer, but it’s important to remember that it’s all a unified experience from the user’s perspective: don’t focus solely on your app—do whatever you can to make the end-to-end user journey as smooth as possible.
We’ve created a dedicated Slack channel named #app-reviews that automatically posts all reviews for any of our apps (through an integration with appFigures). Our entire company hangs out here, checking in to see what people are saying about our apps, good and bad. Just like analytics, this is a great opportunity to celebrate our wins and spark cross-team conversations about where we can improve.
Our #app-reviews company-wide Slack channel.
The Cultural Snowball Effect
Measurement doesn’t just help us understand our system and UX to make us better product owners, it spurs conversations and understanding across teams, making it easier for others to share ideas and feel like they’re actively participating. At Olo, we have cross-departmental discussions between Engineering, Customer Success, Product, Sales, and more.
This culture is (wonderfully) infectious, and quickly leads to a desire for, and commitment to, more measurement. It creates tangible signals that can be understood across the entire team, and establishes a shared vernacular for things that might have been difficult to understand or relate to previously. The answers to the questions you’ve already asked become a conversation; before you know it you’ve got a whole new set of follow-up questions, and the process continues.
I like to compare measurement and alerting to a scientific concept called stigmergy*, which outlines how one action leaves behind data to help inform future decisions. That’s effectively what we’re doing with measurement and alerting: we leave behind data to help ourselves and future teammates understand what’s happening. Build enough context into your alerts so that when one one goes off, anyone coming in to investigate can quickly understand the situation. When a new person joins the team, we have concrete data we can point to, which allows our new hires to visualize how everything works.
*Stigmergy was first observed in termites. Scientists discovered termites leave behind pheromones that other termites detect, allowing them to construct complex nests (e.g. solve problems) by building on the actions of what came before, despite being simple organisms. I know it might sound silly (my own team certainly chuckled when I first explained it) but it works!
Final Thoughts: Continuous Measurement
We measure in several ways, but each channel, error reporting, analytics, app reviews, and so on, gives us invaluable insight about our apps and ensures that we’re giving our users the best experience.
You can’t improve what you don’t measure.
Start small, implementing a few simple measurements, learn from them, and iterate. And repeat, repeat, repeat.
To learn more about why I’m so passionate about DevOps, check out my first post: A Mobile DevOps Retrospective, Part I: 150+ Apps Later.
If you haven’t already, create your Visual Studio App Center account, connect your first app, and start shipping better apps now.
Have an account? Log in and let us know what you’re working on!
About the Author
Greg Shackles is an experienced developer, Microsoft MVP, Xamarin MVP, and spends his days as Principal Engineer at Olo. When he’s not coding, he hosts the Gone Mobile podcast, organizes the NYC Mobile .NET Developers Group, and he wrote Mobile Development with C# to help developers use their existing skills to create native mobile apps.