Frontend testing: How Userlane adopted Cypress to improve their product

Startups often rally behind the saying “Move Fast, Break things”. Yet, it’s not an excuse to break the quality of your product while moving fast. In this article, we interviewed Tobias Müller, a senior software engineer at Userlane, and Gleb Bahmutov, the VP of Engineering at Cypress.io. Together, they shared how the team at Userlane uses Cypress to increase stability, improve performance, and ensure quality for their CI/CD process while maintaining developer happiness.

Tobias, can you tell us what is Userlane?

Userlane is a Startup from Munich, in Germany. It was founded 4 years ago, and we are today around 80 engineers. Userlane is a product that runs over other website UI, helping to show users how to interact with the underlying website UI. A good example of this would be using SAP for tracking time. This is where Userlane comes to play, we run on top of SAP and we run an assistant to show the user how to track time, by highlighting elements and show you an example and walk you through the process.

How does it work? Is it compatible with a wide range of JavaScript frameworks?

We provide two different techniques, either:

Integrate the JavaScript snippet that injects Userlane’s script into the host website.
Use the browser extension which will automate the integration process.

Our tool is framework agnostic but the core algorithm for selecting DOM node knows how to deal with different JavaScript frameworks. For instance, if you have a randomly generated ID for the DOM node, we need to make sure that can still identify the correct DOM node even if the ID has changed, and no matter what framework the underlying host uses.

What is your End to End (E2E) strategy when testing your product?

We need to have an E2E strategy because it’s not enough for us to just test our software, we also must make sure that our software can run on top of other pages. There are so many technologies out there and when you are building a tool like ours that runs on top of other pages, you encounter so many different scenarios, and you must make sure that when you release a new version, it must work.

In our early days as a startup, most of our E2E tests were done manually. When our team was quite small. We simply went to our customers’ pages and started manually testing our different scenarios. But that didn’t scale well as you can imagine. Then, about a year ago, we started with serious automated E2E testing. One very important part of that is our Sandbox.

Sandbox is first-layer app where we try to reproduce everything we see on the web. We have some pages that are SPA, some of them are static pages, others are specific to a framework (React, Vue or Angular) or vanilla JavaScript. We reproduce a lot of scrolling behaviors… etc. We have built some complex algorithms that deal with all these scenarios.

We also use Sandbox as our main source for testing. We reproduce the behaviors of our customer pages inside our Sandbox.

How it’s done technically? What kind of E2E tooling you are using?

In the beginning, our first step in automated tests was TestCafe. We choose to use TestCafe mainly because one of the developers had experience with that tool. So, for very simple test cases, it worked well. But when the tests started to grow, after reaching 70 test cases, in a couple of weeks; we started noticing some limitations to TestCafe. Our CI pipelines were taking almost an hour to run. And trying to parallelize those 70 test cases was challenging to accomplish.

Also, we noticed that some of our tests were flaky and randomly failing. So, waiting an hour for the CI pipeline to finish, and then find out that some of the tests were failing because they are flaky, wasn’t an option for us. TestCafe wasn’t working for us anymore. Our team was unproductive and frustrated. We had to look for something new.

One thing also, that wasn’t working for us is Selenium. It didn’t work well without the CI pipeline. We needed to correctly set it up, run it and maintain it.

At the beginning of 2020, we did some researches and found out about Cypress. Then, learning that Cypress wasn’t relying on Selenium convinced us to give it a try. Plus, we liked their documentation, it’s nice and it provides great onboarding videos that show you basic tutorials on how to get started and set up everything.

Another reason that made us adopt Cypress is their amazing parallelization feature. Just by adding two parameters to the run command, you can use their dashboard to parallelize your workload.

Gleb, tell us more about Cypress?

Cypress is a test-runner for anything that runs in a browser. Produced by Cypress.io in Atlanta GA (USA), “it is praised by Web developers for its cross-platform GUI, including everything needed to test modern web apps, and support for multiple browsers. Users create a test file in JavaScript to automate user interactions and test the outcomes in a website running live in a browser. Tests can be run interactively on the developer’s machine, or as part of a CI/CD process running on a testing cluster. Cypress records and saves screenshots and videos as it goes to ease the debugging process.

A test file in Cypress, with the Test Runner executing the tests and a Chrome browser with the website being debugged.

Although Cypress was created in 2014 as proprietary licensed software, since 2018 it has been available as open-source software under the MIT license. All the documentation and examples are also open source. Releasing Cypress as open source was instrumental in building trust amongst the userbase, as technology they could adopt and be sure it would always remain available. It also spawned an active community of contributors around the Cypress GitHub repository, accelerating the development process. Of course, this also entailed a shift in the business model from license revenue, but Cypress.io (the company) has been successful in generating revenue streams from services on top of the testing framework: storing and managing test results, hosting test screenshots and videos, orchestrating test matrices on clusters, and so on.

How Cypress is organized?

Cypress itself is an Electron-based desktop application. We chose this framework so that we could natively support multiple platforms from the same codebase. It also simplifies the install experience for users (made even easier thanks to a tiny npm package that downloads the desktop application for the user’s system). Cypress also features a built-in browser engine based on Chromium to streamline the testing experience (and users can also run tests in any installed browser like Chrome, Firefox or Edge). It’s a big application, but we’ve made sure to keep the architecture clean so that it’s easy to maintain and easy to test.

As we were developing Cypress, our main goal was for the Test Runner to be able to access applications in a way that yields consistent results, and to avoid “flake” caused by timing fluctuations and issues like that. By running tests directly in the browser, the app can pause while tests run, which eliminates a major source of flake.

The companion Cypress web application provides a dashboard for subscribers to monitor and manage their testing. It’s a React application, with data managed in Postgres via the GraphQL API, which lets us iterate quickly. The back-end services are written in Node.js. The architecture allows to react in real-time to sudden spikes in demand, which was an issue we had to deal with as demand for Cypress grew. The original application was in CoffeeScript but refactoring it in TypeScript has provided benefits in maintenance and reliability.

A schema of the Cypress architecture

Back to you Tobias, how did you migrate from TestCafe to Cypress at Userlane?

The process of migration was smooth. We booked a day for an internal workshop with three people from our team. So, it took us only one day to get everything setup and CI configured. We also managed to migrate few test cases and had them run using Cypress. After one week, we had all our 70 test cases migrated to Cypress.

One thing to mention, thanks to Cypress and its amazing parallelization feature, we managed to reduce our CI pipeline from 60 minutes to only 2 minutes and 30, using 12 machines.

We all know that E2E testing can be complicated. What challenges did you encounter when writing E2E tests?

The main issue we were facing was with flaky tests. This is an issue we see more than other companies because our tool runs on top of other Web pages. We need to make sure that our tool and all its dependencies are loaded before triggering anything. This caused us a lot of headaches.

In the beginning, we had almost 50% of the tests that were failing, just because they were flaky. To fix that, we had to learn how to effectively write good tests, especially when parallelizing tests. When two tests are running at the same time, and they are accessing the same resources, we had a lot of inconsistencies. We ended up cloning our staging environment and creating a dedicated testing environment. In this test environment, every single test case creates its microenvironment: we call it a “Property”. Each test can create its own Property and act on it, by creating, generating, or modifying test data, without interfering with anything else.

We use Kubernetes to manage and run the test environments. We generate a dedicated testing environment and create new entities, and every entity is independent of everything else. An entity is a collection of users, Properties…etc.

What kind of benefits you get from using this architecture and from using Cypress?

We got a lot of benefits. First, it made our tests reproducible. Secondly, if something goes wrong, it’s very easy to debug by inspecting the respective entity.

In Cypress 5.0, we took advantage of the native retry feature. This was a game changer for our team because it allowed us to deal with flaky tests. It may sound like a trivial thing, but it was amazing for us because we were able to get rid of most of flaky tests.

We are using Cypress dashboard and it’s great to see the curve of failing tests go down each week!

Tobias recently shared Userlane use of Cypress in a webcast. You can watch it on the Cypress Blog.

Thank you, Tobias, for sharing this experience. Gleb, one last question. Do you have some additional learnings to share from your Cypress experience?

I joined Cypress 3 years ago, after being impressed as a user of the software. In that time, I’ve learned several lessons from this experience:

Focus on empathy. Being able to wear the hat of your users helps you design features, prioritize needs, and understand what “quality” really means. Empathy means talking to, understanding your customers, and translating what you’ve learned into code.
Good tools, and especially good open-source tools, live and die by documentation. We invested early in good documentation, and our engineers update docs as part of the development process – we don’t have a separate docs team by design.