Clustering shop visitors based on profile attributes (e.g., based on shopping behavior) can be used to provide personalized product rankings, aiming at displaying more relevant products at the top positions in an online shop. While our approach allows for reusing most parts of our regression-based ranking model, we have encountered new challenges in the personalized setting on the way to actually provide the desired user segments and rankings.
Consumers face many choices when shopping online. While browsing through a product ranking, e.g., the list of all shoes or dresses in an online shop, the customer can choose between different sorting options: descending and ascending price or rating, newest products or product popularity, whereas the latter is often default. To provide users with more relevant products in a web shop, we think, a personalization of this ranking is very useful. Giving an example (Fig. 1), there are two women in the same age, but with very different characteristics and interests. The aim is to provide both users with a product ranking that fits their needs.
In this article, we aim to present personalization options for product rankings, obstacles that can occur on the way and the solution approach, we successfully implemented at the Otto Group.
The Otto Group is a globally active group of retailers and retail-related service providers, present in more than 30 countries. With online sales of 8.1 billion euros out of 14.3 billion euros total sales (2019/20), the Otto Group is one of the world’s largest online retailers. The Otto Group data.works GmbH is responsible for storing, anonymizing and GDPR-compliant handling of retail data as well as providing cutting edge machine learning and data solutions.
Historically, product rankings were based on a large set of manual rules and descriptive metrics. In order to conduct personalized product rankings, we decided to separate the problem into two parts: the ranking problem itself and the personalization problem.
The ranking of our starting point follows a predictive, but non-personalized approach. The underlying idea is, that products with a high probability to be bought should be ranked higher than products with lower expected sales. The implemented model therefore provides a sales prediction for each product for the upcoming day. The items are then ranked in descending order, in the assumption that this implementation approach leads to better conversion rates and to an increased sales volume.
Looking from a theoretical point of view, there are three different approaches in the area of learning to rank: pointwise, pairwise and listwise. We have chosen a pointwise method, which assumes that each item in the training data has a numerical or ordinal score, the expected sales or revenue in our case. The ranking problem is therefore reduced to a simple regression problem.
Having the target and model type set, we can now focus on the creation of useful features generated from product details, transaction and tracking data, e.g.:
To evaluate the quality of our predictions we have chosen two main approaches: To improve and evaluate the model’s performance we used a common metric for ranking problems called normalized discounted cumulative gain (nDCG). It measures the gain of the predicted ranking in relation to the perfect ranking, in our case the actual sales number in descending order. By using historic data, we were able to evaluate, that our new ranking performed better in terms of the nDCG compared to the results with the previous rule-based baseline ranking.
However, we did not know, if the rankings are actually more useful in practice and if the general approach is effective in the web shop. To evaluate the customer behavior and satisfaction, we implemented multiple AB tests, splitting web shop visitors in two groups to compare the rule-based ranking with the new prediction-based ranking. After some rounds of incremental development and continuous improvements, we were able to measure significant uplifts across multiple shops regarding the desired metrics, e.g. conversion or overall sales. This leads us to the next big step — the personalization.
While thinking about a solution to provide users with a ranking that suits their needs and interests, we can differentiate between a variety of options:
Following the classification scheme above, we decided to implement an implicit personalization for clusters of customers, testing different kinds of data but without real-time adaptation. The realization of a ranking solution on an individual level would have needed adaptation in the shops in a larger scale. Provision of an individual ranking for each shop visitor would need to retrieve rankings on-the-fly, leading to many changes.
Thus, we decided to create personalized rankings based on user groups. We split the problem into two parts: Clustering shop visitors into user segments and predicting segment-specific rankings using regression (Fig. 2). For the latter step, we can use the existing model from the starting point with minor adaptations and keep a shared code base for global and personalized rankings.
For the generation of user segments, shop visitors are clustered based on profile attributes from previous shop interactions, like interest in specific assortments or price sensitivity. In order to create a user segment-based ranking, features are computed for different segments, and personalized product ranking scores are predicted for each segment.
During realization of the personalized product rankings, we have come across a number of challenges. While some had been expected in advance, others have been more of a surprise on our path.
When developing the segmentation model, we encountered the challenge how to evaluate the quality of specific profile attributes and resulting segments. Similarities of visitors in the profile attribute space and resulting user segments do not necessarily reflect similar shopping behavior. While one could work with the presumption that resulting clusters are expected to lead to useful segments for personalized rankings, we decided to evaluate suitability of profile attributes as well as segments by looking at corresponding purchases in historic data.
Due to splitting the visitors to user segments we now have higher sparsity in the feature tables as well as less purchases for computing the target in the regression problem. If the user segments differ in size, the situation can be even more extreme for smaller clusters. This can lead to the problem that there is almost no “positive” signal for a high number of products and countermeasures have to be taken into account to deal with sparsity.
Obviously, we need a way to identify visitors and assign the corresponding user segment based on the aforementioned segments. Our shop tracking data contains a visitor ID, but we also would like to recognize a visitor coming from a different context. Login events can be used to identify if visitors actually belong to the same account. In order to assign a visitor to a segment, we need information from previous sessions (or intra-session). The transition of a previously unknown visitor to a visitor assigned to a behavioral segment can lead to the situation that in the beginning of a session no personalized ranking is used until the corresponding segment has been retrieved. This might lead to undesired effects like varying rankings of the same product category, potentially confusing the visitor.
In our approach, we automatically cluster shop visitors into behavioral clusters. Once a trained segmentation model exists, unknown visitors can also be easily assigned to one of the user segments in the future. However, if the segmentation model is re-trained later, the resulting model and the segment ID mapping for visitors will look differently in the general case. Thus, whenever the segmentation is updated, known visitors might receive “wrong” lists of their previous segment ID until an update of the mapping is received. Timing of workflows is an important issue, as provided rankings for user segments should match the active user mapping in the shop. In the worst case, bad timing can lead to displaying the ranking of another user segment, potentially leading to worse performance than a non-personalized solution would have produced.
During our different AB tests, we also had to deal with a number of obstacles. Regular checks like having balanced and persistent test groups aside, we learned that there are several places where something could go wrong although the overall architecture does not appear to be very complex. From our experience, it is useful to check if segments are assigned to visitors correctly, if their updates work properly in the shop, and if corresponding rankings are used throughout the session. Having a personalized service leads to a complex situation for checks or debugging as it is not sufficient to only inspect isolated contexts. Building some automated monitoring if correct rankings are actually used in the shop can help to identify problems quickly.
After an analysis of potential paths towards personalization of product rankings, we have decided to implement a pragmatic solution with implicit personalized rankings for behavioral user segments. The solution fits well our existing product ranking approach and does not need intense changes in the shop systems for our online retailers. During our journey we have learned that even such a straightforward solution for personalization has its complexity resulting in a number of challenges.
We are currently using our ranking solutions in different versions, non-personalized and personalized, for several shops of the Otto Group depending on their specific needs. Each shop can use its own target value (like turnover or sales quantity) in order to meet specific requirements. In future work, situative profile attributes could be additionally taken into account for user segmentation. Other potential next steps could address real-time adaptations or personalization on an individual level, i.e., creating a personalized ranking for each shop visitor.
This article was originally published on Medium and can be read there as well.
I really like the approach you are taking here, to give absolute transparency about the data used and data needed to provide the perfect user experience. As a user, as long as I get to know which personal data is collected and what for, this is a huge help.
Great article, thank you very much for it. Especially in this area, the insights are often unfortunately not shared, so I am all the more pleased.
We have also had good experiences with onsite personalisation time and again. The conversion leverage here is very good. Especially if you can rely on a good data infrastructure, it can really be worth its weight in gold.
We have received your feedback.