You're looking for a new screen, and then you see what everyone else sees - a banner ad for the latest baking dish. That's exactly what you wanted to see... Or should we offer you a banner that really suits you? We test new features to find out what helps you in your search. Jannis, Data Engineer, told us how testing works and how we develop: How can we improve otto.de? This is the question that drives us as digital analysts at OTTO. We look for problems, discover potential and test our latest features. Together with our colleagues, we develop otto.de for our customers and thus contribute to OTTO's sales growth.
In our company, e-commerce is divided into "products": For example, one interdisciplinary team of specialists looks after the otto.de storefront, while another looks after search. In each of these teams, we support and accompany the entire customer journey with the aim of improving the experience on otto.de. From the moment the user enters the site to the completion of the purchase at the checkout. We follow a scheme to achieve valuable results:
First there is the problem, and that needs to be identified before it can be fixed. But how do we do that? We use interviews with our users and potential analyses to identify development opportunities on otto.de.
Once a solution has been found, the specialist team starts to develop a feature and we start to design the test. But what do we want the new feature to do? We approach this question with robust hypotheses: "By showing personalised banners, 10% more users in the test group will look at products from the range". To formulate the hypothesis, we use a key performance indicator (KPI) on which the feature has an impact, such as the click-through rate. We also use a runtime estimate to determine how long we need to test to get valid results. A storefront test requires a shorter test period because more users will see it than a change to the checkout process.
Does the new feature now have a positive impact on the user experience? Yes or no. This is how simple the idea of A/B testing can be broken down: Visitors to the website are randomly divided into two groups. One group is shown the status quo, while the test group sees the change. Or, to be more specific: One group still sees baking pans and you are recommended the latest gaming monitors. Target group splitting helps to create the two groups for the A/B test. A cookie tells us whether you are visiting otto.de for the first time. If this is the case, you will be randomly assigned to one of the two groups. With the defined KPIs, we can now measure whether we have been able to help you find monitors.
If you found a new screen inspired by us, the test was successful. Unfortunately, it's not that simple. We use descriptive analysis to see what else our feature does: Does it work differently for different product groups? How is the feature received on different devices? And do different age groups react differently to the change?
We manage the accumulated data using technologies such as Hadoop, Google Cloud or our own OTTO Business Intelligence tools. We use SQL to query all the data relevant to the test and process it with Python or PySpark. However, languages such as R or Scala can also be used for this purpose. The data is now structured. We use inferential statistical methods such as the Wilcoxon or T-test to analyse the data. Only when we have analysed all the data and all the test results are available do we make recommendations for action. Perhaps the feature should be developed further, or perhaps it was a complete success and we can roll it out to all users.
As you can see, testing new ideas is important so that we are constantly learning. We want people to spend their time on our site efficiently and see content that adds value. And we only show you baking pans when you really need them. You can find out more about the development of otto.de in our tech blog.
We have received your feedback.