How we test and roll out new product features

When we roll out a new feature, things don't always go 100% right. This is where A/B testing comes in handy. Our new configuration system helps us do that.

By Team member, 13/03/2019

3 minutes read

In the early days, as a smaller company, when we changed something - whether it was something huge or just a single word on a screen - we would just go ahead and release it immediately to everyone. This made sense at the time, but we weren't able to slowly roll-out a new product, or have customers opt-in to a feature. So if we released something and it wasn't quite right, Cuvva could have been in a lot of trouble.

We really needed a way to test the effects of a change on a small set of customers (i.e. A/B testing) and a way to slowly release things to more people over time. This would give us more confidence in being able to try things out before we were certain they were right, and helps to reduce the risk that things might go wrong - whether from a legal or compliance perspective, or just causing users to get confused. It would also mean we could completely disable entire parts of the app, just in case of an emergency (luckily this has never been necessary!).

The answer came in the form of our new configuration system. It lets us control how many people will have each feature enabled. Sometimes it will be for something simple like making a custom app icon available to everyone - but we only set it at certain times of the year (like Christmas). Other times it could be used to support the rollout of a whole new feature or product only to a certain percentage of customers (like our new travel insurance product).

Our Christmas app icon.

How it works in theory

We use "flags" to control the behaviour of the apps. For instance, when we were rolling out travel insurance, if we had enabled the "travel" flag, then the customer was be able to purchase travel insurance policies. ✈️

We didn't want to manually apply a set of flags to specific customers every time, or have to do lots of manual processing. Flags should follow a set of rules that we define so we know they hit a percentage of customers automatically - even as more customers join - rather than us having to actually set it up separately for a specific set of people (i.e. we want it to be "stateless"). The biggest challenge is that the system needs to be really versatile, as there is quite a bit we need it to do...

A flag might need to be applied differently depending on whether you're a staff member or a customer (e.g. normally we'll give access to all staff, while maybe only 5-10% of our customers).
Some flags may be strictly kept to the random percentages, and some may have the ability for staff members or users themselves to override them.
Flags may be incompatible with a specific app version, so we have to version every flag individually to make sure apps don't break, or behave unexpectedly.
Flags should be overridable, but only when we configure it to be allowed. And we should be able to change that after the override is set.
The API call to to see what flags are on should be quick - it'll be requested a lot!

As technical briefs go, it's not massive, but there is some complex logic to program😅. Generally speaking, the more flexible the system, the harder it is to plan and build.

How it works in practice

Take this weird example. Say we had a flag called "blue_payment_button". This flag, when turned on, changes the "Pay now" button to blue instead of green. And say we want to test the effect this had by rolling it out to 10% of customers. When the app requests what feature flags are on, a customer may be within the selected 10%, so you'd initially think the flag will be on.

But once the service checks its database, it might find there's an override present. That means someone has explicitly requested it to be a specific value for a specific customer, even if normally it would be something else.

So now you'd think the flag will be off. But wait! The configuration on the feature flag doesn't allow for it to be overridden right now. So the end result is that the flag is turned on for that customer.

Here we run the above code to check if a "subject" is in the X% for the rollout. In this case, the "subject" is either a user ID or a device ID.

Then we can use it in our service to do a quick check, against the intended value, in this case, the `Value` is our intended percentage rollout in basis points.

This is just one of the cool things we've been working on at Cuvva HQ. If you want to get involved in helping us build insurance the way it should be, head over to our jobs page. We're hiring! 🚀

Team member