What’s Your Number: How UX Rating Systems work

Two thumbs up. 4 stars. A+. Our world revolves around different rating systems. Whether you’re looking at different reviews online for a product, which hotel to book your stay, or even your own personal intellect, we rely on rating systems to provide us with an idea of capabilities or expectations. Some rating systems are complex, while others are solely based on subjective “gut feelings.”

If you’re building a product with UX in mind, the end goal is to have it be useful, usable and desirable. It might seem like a product’s usability would be hard to measure, which is true to an extent. When you describe why you like a brand or product, sometimes it’s hard to put your finger on exactly what makes you like it. It’s a combination of elements, and all you know is that you just like it. Luckily, rating systems do exist to measure a product’s usability, providing different benchmarks and goals to keep in mind when developing a product.

There are four main metrics used to measure different aspects of usability: SUS, NPS, SEQ, TCR. Three of the four metrics are determined by the user’s opinion after user testing. Between the users describing where they have problems, and observing any issues, you can be confident that the data reflects the true state of a system, usable or not. Overall, high scores are consistent with good usability sessions, and low scores are consistent with poor usability sessions.

SUS - System Usability Scale

The System Usability Scale (SUS) is an evaluation tool that allows us to assess the usability of a variety of systems. This scale consists of ten statements, each with five response options that range from “Strongly Disagree” to “Strongly Agree,” rated from 1 to 5. The SUS can be used with small sample sizes to measure effectiveness, efficiency and satisfaction. It can also be used to establish benchmarks for systems to track changes in usability after revisions — all with a single score. Participants are asked to complete the SUS questions after all interactions with the system.

SUS Questions

I think that I would like to use this system frequently.
I found the system unnecessarily complex.
I thought the system was easy to use.
I think that I would need the support of a technical person to be able to use this system.
I found the various functions in this system were well integrated.
I thought there was too much inconsistency in this system.
I would imagine that most people would learn to use this system very quickly.
I found the system very cumbersome to use.
I felt very confident using the system.
I needed to learn a lot of things before I could get going with this system.

The SUS score is out of 100. The scores for each participant are averaged for each usability session to track our progress, and also aggregated, to calculate an overall system score per product. The global average SUS is 68, and our products usually score around 80.

NPS - Net Promoter Score

The Net Promoter Score (NPS) is a measure of customer loyalty that is based on one question:

How likely is it that you’ll recommend this product to a friend or colleague?

By asking this question, we are able to get a sense of the system through the user’s eyes, and whether or not it is really a product they would want to use. The response options range from “Not at all likely” to “Extremely likely,” rated from 0 to 10. Responses are then divided into the following categories:

Promoters (responses from 9-10) are enthusiastic about the product. They are loyal customers who will refer others to your brand.
Passives (responses from 7-8) are satisfied with the product, but unenthusiastic which could result in vulnerability when it comes to the competition.
Detractors (responses from 0-6 ) are unhappy with the product and could potentially damage your brand through negative word-of-mouth.

SEQ- Single Ease Question

The Single Ease Question (SEQ) is a rating scale that allows us to assess how difficult participants find a presented task. We ask this question immediately after a user attempts to complete a task or set of tasks so it remains in context with that specific part of the system. The response options range from Very Difficult to Very Easy, rated from 1 to 7.

Overall, how difficult or easy did you find this task?

For each usability session, responses from all tasks are averaged and presented as an overall score on a scale from 1 to 7. The SEQ score for each individual task or set of tasks is also reported for each usability session, to see how each task was viewed across participants. SEQ is also reported as a percentage for easier understanding. Just as with our other scores, we also aggregate all scores for a product, for an ongoing measure of this metric.

There isn’t a global average for SEQ, but we usually aim for a score over 80%. Currently our products average an SEQ of about 82.5%.

TCR - Task Completion Rating

The Task Completion Rating (TCR) is a data point that is assigned after each task is completed. It defines the level at which the task was completed on a scale of 0 to 2; 0 meaning the task was unsuccessfully completed, 1 meaning the task was successfully completed but help was needed, and 2 meaning the task was successfully completed without help. In order to score a participant with a TCR, there needs to be predefined success criteria (a beginning and an end state). Gathering this data helps us pinpoint the areas of the system that work well, and the areas that don’t.

TCR is reported as a score on a scale from 0 to 2 and also reported as a percentage. The average TCR score for each individual task or set of tasks is calculated, as well as all TCRs for a usability session. We additionally aggregate all product TCR scores for an ongoing measure of this metric.

There isn’t a global average for TCR either, but again we usually aim for a score over 80%. Currently our products average a TCR of about 83%.

What if you have low scores?

Sometimes, low scores are easy to remedy – a change of wording here, a different look to a button there – but sometimes it can be a bit tougher, like if a framework isn’t being understood, or if a participant is having a hard time seeing the value in a feature. We always recommend iterative testing, continuing to test revisions until scores reach a set threshold and participants successfully use the system. It might take one concept validation session, or it might take five for more complex systems. The key is to keep testing, until the system is usable.