A couple of weeks ago we released some research that explored 29 common treatments seen in ecommerce optimisation. Treatments ranged from simple adjustments, such as changing a button, to more complicated implementations like product recommendations.
The only way we can learn about the incremental effects of these treatments is to run them as an experiment.
Fortunately, over the last four years we have conducted tens of thousands of experiments. We can’t learn that much from the evaluation of the effects of these treatments in each individual experiment, but by categorising thousands of them and aggregating their results together we can get a pretty good understanding of the likely effect each treatment is going to have.
This might seem like an academic exercise but we believe it's very important. For the first time, instead of making decisions on where to put resource based on “industry best practice”, hyperbolic case studies, or internal argument, we have a quantitative benchmark of what works.
As with any analysis there are some important details to understand.
Firstly, in our paper we measure the success of a treatment in terms of the incremental effect it has on revenue per visitor (RPV).
This is not a standard measurement within the industry but we love it! It combines the proportion of visitors who convert with the amount each converter spends. It’s a great metric because from a business point of view it can be converted to gross revenue (by multiplying by the number of visitors in the experiment). It is also a great metric from a statistical point of view as assumptions of independence are maintained, which is crucial for an accurate understanding of a treatment’s impact.
The second important detail is the numbers we reported about each treatment. Within the paper we talk about the mean impact on RPV. This can be interpreted as the effect you can expect to get from implementing a treatment of this kind on a site. However, it is not the whole story.
Some treatments have a much wider range of effects (like a full page redesign). For these kind of experiments, although the mean may be very close to zero, there is actually quite a high chance of having a big positive or negative effect; especially compared to a treatment such as a colour change of an element - which has a very small range of effects all very near zero. Both the mean impact and the variability associated with the treatment should be taken into account when assessing the likely impact.
One way of trying to sum both the mean and the variability up into one number is to judge the treatment based on the probability that it will have any positive effect. We call this the probability of uplift. Within the treatments we outlined, only 8 have a probability of uplift of above 50% - you can see the data on this here. For some people this may be surprising but for those experienced in online experimentation this will be no shock. It’s hard to change user behavior and even harder to change user behavior in a way that benefits revenue.
Through running experiments we stack the deck in our favour by learning what works and what doesn’t. Without experimentation we are throwing mud at a wall - without even looking to see what sticks.
Good luck and we hope this research helps you achieve your optimisation goals.