How Canary Tests Can Improve Your Quality

Let's take a look at how canary tests work.

Canary tests are a great way to add an extra layer of quality assurance by running tests on real users.

How to Run Canary Tests

Canary tests send a small percentage of production traffic to a new version environment and perform a predefined list of tests. These tests typically include infrastructure, performance, or even user acceptance tests. Once these tests pass, the canary version of the application may be deployed to all production users.

Many development teams maintain a separate branch in their source repository for canary releases. When a new version is released, an automated build process is triggered on Jenkins or other CI/CD platform. The build is then deployed to canary users using one of many different approaches, depending on infrastructure preferences.

Some of the most popular ways to run canary tests include:

Load Balancers: You could set up two load balancers, where the first receives 95% of traffic and the second receives 5%. Canary deployments would be limited to the second load balancer, and if the tests pass, it’s deployed to both.
Amazon AWS: Route 53 lets you use weighted records or auto-scaling groups (ASGs) to split traffic. When using ASGs, you can update the ASG with the canary Amazon Machine Image (AMI) without worrying about downtime. Rollbacks are as easy as restoring the old AMI to the ASG.
Kubernetes: Kubernetes uses the same approach as AWS using Docker images instead of AMIs.

Once the canary release is live, the quality assurance team monitors a variety of endpoints to ensure the application behaves as it should. They can either roll back the canary release if an error occurs or deploy to all users if the tests pass and the code is deemed production ready.

When to Use Feature Flags

Feature flags are another popular approach to canary testing that focus on specific features. Using code rather than releases, feature flags enable development teams to turn specific features on and off for specific users. These tools are most helpful for business stakeholders who need to test new features before committing them to everyone.

Some of the most popular feature flag use cases include:

Early Access Programs: For stakeholders who want feedback on early features before releasing them to all production users. Early access programs use feature flags to accomplish these goals and get valuable feedback to fine-tune releases.
Blocking Features: Some applications may need to block features from certain users (e.g. for regulatory reasons), and feature flags make the process as easy as flipping a switch rather than building out extensive user-facing options.
A/B Testing Features: Competing features can be easily A/B tested without exposing either feature to the entire user base. For example, you can test what version of a specific feature performs better.
New vs. Power Users: Some features may be too complex for new users, but necessary for power users. Feature flags let you disable advanced features for new users to avoid confusion while providing them to users who could take advantage of them.

Many different tools make it easy to incorporate feature flags into an application. For example, the Rollout gem makes it easy to add this functionality to a Ruby on Rails application without adding technical debt. Commercial companies, like LaunchDarkly, also provide these capabilities across a wide range of platforms for a monthly subscription.

Adding Automation to Load Testing

Performance testing is a critical component of canary tests. While test engineers can be confident that unit and integration tests will hold up in production, performance is always a wild card because it depends on so many different external factors. Canary tests are one of the best ways to answer these questions without risking problems among users.

Most test engineers are familiar with protocol-based load testing using tools like JMeter, but keep in mind their limitations. For instance, protocol-based load tests don't execute JavaScript, which means that it may not paint an accurate picture of performance for JavaScript-heavy, single-page applications.

That’s not a problem for LoadNinja. It simplifies load testing by enabling anyone to build them with a record-and-replay approach. After building load tests (without the need for dynamic correlation), it's easy to scale them across tens of thousands of real browser instances to get an accurate picture of performance – it's as close as possible to actual production environments.

LoadNinja’s Record and Replay Capabilities

A quality assurance process may involve running browser-based load tests with LoadNinja, at scale, on a canary release candidate before deployment. When these tests pass, you can release the canary deployment and monitor production users to confirm everything works as expected. You can then deploy the canary release to all production users.

An Early Warning System for Real-World Testing

Canary tests are a great way to ensure that an application behaves as expected in a production environment. While test engineers may be confident in unit and integration tests, it's hard to know how an application will perform under load in a production environment. The combination of canary tests and LoadNinja can help eliminate most regressions.

See how easy it is to get started with load testing today!

Start Free Trial