Cloze testing at Cuvva: giving our users the mic

Grace, our Content Designer, explains how our users have influenced our copy
By Team member, 16/09/2021
4 minutes read

Whilst 2020 was a strange year for most people (us included), it was a big year for content design at Cuvva HQ. It was the year that we finally took the plunge into content testing — the year we took the content off the nicely designed, beautifully functioning screen to ask our users what they really thought about how we speak.

We hired a researcher whose sole focus was testing our content (hi, Chloe 👋). With her leading the way, we’ve tried a few different methods including the highlighter method, using emoji rating scales to measure sentiment, and comprehension tests (like the five second test). But one of my favourite (most insightful) tests of the year was our first cloze test.

Cloze testing

Cloze testing works by removing certain words from your writing and asking users to fill in the gaps to test comprehension. Usually a score of 60% of words answered correctly is considered good.

What we used it for

We used it to help us on a tricky screen in our new monthly-subscription car insurance flow.

We found that when it comes to annual car insurance, users have a very specific idea of how it should work. The industry has always worked in much the same way, so users’ expectations have been allowed to solidify overtime. This means that when we came in with a new product that shook things up a bit, we found one of the biggest content challenges to overcome was changing the perceptions about how insurance should work.

One place in particular where we had our work cut out for us was explaining the payments for our monthly subscription product. We’d seen in general usability testing that people were getting stuck. We decided to test the content in isolation using cloze testing to get a better idea of why people were struggling.

The method

The general rule of thumb in cloze testing seems to be removing every 6th word, but we found this didn’t work for us. A lot of the words we’d removed were ‘and’ or ‘the’ — words which didn’t contribute much to a user’s comprehension of the overall piece. So instead we removed words strategically, choosing to omit words that had a big impact on the overall meaning of the content.

We recruited 8 users to complete research in 3 parts.

In the first part, we gave the cloze screen to users and asked them to tell us what word they thought went in each gap, recording ‘yes’ or ‘no’ based on whether they got it right. (We accepted words that weren’t exactly what we’d chosen but had the same or similar meaning.) We also noted any incorrect answers they gave.

In the second part, we presented the same users with the completed screen (with no words missing), and asked them to read through it. Once they’d finished, we asked them to explain how the payments worked to test their comprehension.

Finally, in the third part, we gave the users the cloze screen once more (with the same words missing) and asked them to repeat part one.

What we learnt

From this research we got a great deal of insight.

Firstly, it was extremely interesting to see the cloze scores from the first part of the research. As we know, the majority of users don’t read every word and are unlikely to engage properly with every screen in a flow. Add to that the fact we’re selling car insurance via mobile and you have to really consider what information you put where.

There were some gaps that no users answered correctly — which gave us a good indication of where we needed to switch things up.

Secondly, it was very telling to see where users had got words wrong on the second cloze test, especially where they scored highly on the comprehension part of the task. For them to have seen a completed screen with all the right words, and still get them wrong on the second cloze, told us we needed to rethink not only those specific words, but the context within which they sit.

Finally, and perhaps most exciting, was that we could analyse the language people used when filling in the blanks. Their incorrect suggestions were in many cases more useful than their correct answers as it gave us insight into how they were thinking about the topic of payments, and what information they expected to see.

When considering these findings in light of users’ comprehension, as measured in the second part of the research, it gave us a great idea of the mental framework these users were bringing to our product.

What we did with the results

We changed some of the language we used to talk about payments. Previously, we’d been talking about ‘billing’ because it best reflected what happened each month. But we found this just confused our users. The term ‘payment’ took a lot less processing for them to understand. We also stopped talking about ‘cycles’ and instead just referred to the monthly policies. It seems that introducing the concept of cycles was useful to us internally — but actually created new terminology for users to wrap their heads around unnecessarily.

We also changed the hierarchy of information, and cut some stuff out. Initially, we’d reminded users of their monthly policy schedule so that we could align the payments to that. But we found people weren’t expecting a reminder of how policies worked at this stage — and the information just served to confuse them.

Finally, this allowed us to start building a framework for what users expect from an insurance product of this kind. We’re still testing and learning, but we’ve already been able to review language elsewhere in the app to better meet our users’ needs. Most notably, I was able to redesign the product intro screens to better align our product with users expectations, subsequently increasing conversion by 17%.

If you’re interested in learning more, you can watch a talk I did for Content Design Ireland about the content-specific testing we do here at Cuvva.

Team member