1
 
 
Account
In your account you can view the status of your application, save incomplete applications and view current news and events
December 03, 2018

OTTO goes AWS - Part 2

Part 2: Experiment"Decentralized Operation"

One of the key questions during the AWS migration was whether all services that had previously been provided by a central Operations / Platform Engineering team could be successfully decentralized. Previously, we had a traditional hosting contract with a provider who managed the infrastructure for us in the form of provisioned VMs. In OTTO E-Commerce, there were several Platform Teams (P-Teams) that developed and operated infrastructure services for the Development / Feature Teams (F-Teams) based on this.

For example, there was a MESOS platform operated by OTTO for the microservices of the F teams, and several MongoDB clusters as databases for each F team, which were further developed and operated by these central P teams. Furthermore, many services necessary for development such as LDAP server for authentication, Jenkins for deployments, etc. With the migration to AWS, the need for a central operations team at OTTO is now eliminated for many business services.

Take database as an example - Now, when using the high-level services offered by AWS such as DynamoDB, the F teams can use them immediately without worrying about operations in terms of updates, maintenance, etc. to worry about. So pretty much any service that was previously managed centrally can have a counterpart in AWS. Instead of a Jenkins, for example, teams can use Code Pipeline in conjunction with Codebuild to deploy their services. If that's not enough for a team, they could run a Jenkins themselves on EC2 or use an entirely different solution. Using AWS' Shared Responsibility Model, it's easy to see what this change means for AWS customers:

How much more central do you need?

A noble goal of the internal migration team was to decentralize as many of the services that were previously managed centrally at OTTO as possible. To achieve this and further strengthen team autonomy, we chose an account structure from the outset that allowed each team to fully manage at least two AWS accounts (one for Live and one for Nonlive) and deploy services into them. However, as the project has progressed, it has become clear that full decentralization does not always make sense everywhere. As far as databases, VMs and deployment tools are concerned, it was easy to argue that this is now up to the development teams. The requirements and preferences of the teams are too different and it was recently already difficult for the central P teams to meet the wishes of the developers.

For other services, such as administration of the DNS root zones, maintenance and further development of overlapping processes (account creation, user creation), authentication server, etc., it still makes sense to have an overlapping team that takes care of such things. In case of a decentralization, these efforts would occur in every team (there are no deviating requirements) and no development team can do such tasks on the side, respectively related efforts would always compete with functional features in the prioritization. Therefore a dedicated team 'Service Integration' has been created within the project. There is also a team that takes care of overarching security aspects, advises the other teams on these topics, and ensures that basic rules such as encryption, non-accessibility of internal services from the outside, etc. are guaranteed through checks in the accounts (e.g. the CIS benchmarks).

Even though the teams are now responsible for operations and the associated incident management themselves, there is still a small central on-call team where the alerting and communication strands converge in the event of a technical problem. The teams themselves are responsible for the alerting process, i.e. for sending alarms or warnings to a central monitoring system. The same applies to troubleshooting, since the know-how about the application and infrastructure is available in the development teams.

How closely does this tie me to the cloud service provider?

In terms of using the services offered by AWS, we made a conscious decision in our area. Instead of building our applications in such a way that they could also be moved to other clouds or even a local data center without much additional migration effort, we have opted to leverage the benefits of the cloud as far as possible for us and to use managed services from AWS where it makes sense.

We have met the protection of sensitive customer data or strategic business data with consistent encryption of all data both in the stored state ('at rest') and during transfer ('in transit') even within the private network segmentsTo get the maximum out of the cloud in terms of operation, flexibility and also costs, we have therefore consciously entered into the commitment to the service provider. A possible later migration (in whole or in part) to other cloud providers is of course still possible, since they offer similar concepts and services and we have already done a lot of preliminary work with the Lhotse project by breaking down our monolith into hundreds of small microservices that can be deployed quickly. However, if you were to build your services in advance so that you could run them everywhere with minimal migration effort, you would have to rely on abstraction layers that you could manage yourself (and thus replicate AWS). Thus, the (especially central) operating expenses would be at the same level as before. The flexibility to try something out quickly also disappears - since the services would first have to be provided by a central team, which can quickly become a bottleneck.

Nevertheless, we have set up our applications and especially the communication channels between our applications in such a way that we can also seamlessly integrate other cloud services or classically hosted services. There will be more on this in an upcoming blog article on 'Inter-Backend Communication'. Thus, despite leveraging the advantages of AWS, the strengths of various other cloud providers can still be leveraged.

Cultural changes

In conclusion, the biggest challenges were not technical but rather cultural and organizational. By no longer being dependent on central P-teams, F-teams also have all the freedom to choose services they want to use. On the other hand, the F teams now also have the obligation to operate these services themselves. Even though the development teams at OTTO are very similar in composition and work according to the same technological and methodological principles, the reaction to these new tasks was very different.

One of the main tasks of the central migration team at OTTO was therefore to prepare the teams for these new tasks and, in addition to the disadvantages that were often seen first by the employees, also to point out the advantages. Through the AWS migration, the developers have now grown much closer together with their operational colleagues and in many cases have integrated them into their F teams. In the process, both sides can learn from each other and broaden their scope. There will be more on this exciting topic soon in the third part of this series.

Conclusion

A few months after completing the migration, we can say that our experiment with decentralized operation has largely been a success. While we have not managed to fully decentralize operations, with the exception of a few centralized services, the F teams now independently manage their applications from development to operation with everything that entails. As expected, there are also differences in terms of the use of the cloud services offered. Most teams use the managed services offered by AWS and thus save operating expenses, while a few teams with special requirements manage a small part of their services themselves and thus accept higher operating expenses.

93 people like this.

4Comments

  • […] Leuchtturmprojekte so medienwirksam scheitern. Außerdem gibt’s im Tech-Blog von OTTO eine Update zur AWS-Migration, und bei Zalando lernt man über die Micro-Frontend-Strategie des […]

  • Leha
    07.12.2018 11:44 Clock

    Da muss ich leider Martin recht geben, Firmen wie Eure haben sich bewusst gegen AWS entschieden und nutzen Azure, da Amazon eigentlich Eure Konkurrenz ist. Allerdings verkommt Amazon sowieso gerade zum China-Ramschladen.

  • Sebastian Aberle
    07.12.2018 13:34 Clock

    Hallo Leha & Martin,
    wir sehen AWS primär als IT-Dienstleister und haben uns natürlich im Vorfeld auch andere Clouds angeschaut und mit unseren Anforderungen abgeglichen. Zum Bewertungszeitpunkt hat die AWS die größte Flexibilität, Umfang und Entwicklungspotential geboten - was jedoch nicht ausschließt in Zukunft auch andere Anbieter zu nutzen, da sich auch die Anderen ständig weiterentwickeln. An erster Stelle steht für uns jedoch immer der Nutzen, die Flexibilität und Schnelligkeit die uns der jewilige Anbieter für den Betrieb und die Weiterentwicklung unseres Shops bietet. Wir haben festgestellt, dass wir durch eine Eigenentwicklung und Betrieb von entsprechenden Services wie sie in der AWS vorhanden sind nicht schneller und flexibler werden und wir diese Services in der angebotenen Vielzahl und Qualität nicht selbst bereitstellen und betreiben könnten. Daher war es für uns nur logisch diese Services einzukaufen. Bei der E-Commerce Plattform selbst haben wir dagegen gemerkt, dass nur eine Eigenentwicklung für uns die größte Flexibilität bietet und es niemanden sonst gibt, der uns die von uns benötigte Funktionalität bieten und flexibel auf neue Features reagieren kann. Daraus ist die Lhotse Plattform entstanden, auf der der heutige Shop basiert.

  • Martin
    05.12.2018 10:28 Clock

    Sehr schön. Jetzt legt sich auch noch der letzte, der das Potential hätte, eine Konkurrenz zu Amazon aufzubauen, mit denselben ins Bett. Gratulation! Was für eine Schmach, was für eine Schande.

Write a comment
Answer to: Reply directly to the topic

Written by

Sebastian Aberle

Similar Articles

We want to improve out content with your feedback.

How interesting is this blogpost?

We have received your feedback.

Allow cookies?

OTTO and four partners need your consent (click on "OK") for individual data uses in order to store and/or retrieve information on your device (IP address, user ID, browser information).
Data is used for personalized ads and content, ad and content measurement, and to gain insights about target groups and product development. More information on consent can be found here at any time. You can refuse your consent at any time by clicking on the link "refuse cookies".

Data uses

OTTO works with partners who also process data retrieved from your end device (tracking data) for their own purposes (e.g. profiling) / for the purposes of third parties. Against this background, not only the collection of tracking data, but also its further processing by these providers requires consent. The tracking data will only be collected when you click on the "OK" button in the banner on otto.de. The partners are the following companies:
Google Ireland Limited, Meta Platforms Ireland Limited, LinkedIn Ireland Unlimited Company, TikTok Information Technologies UK Limited
For more information on the data processing by these partners, please see the privacy policy at otto.de/jobs. The information can also be accessed via a link in the banner.
You can also withdraw your consent at any time without giving any reason by clicking on the button 'Cookie Settings' in the footer of the website and 'Refuse Cookies'.