For several years, we have been building otto.de in independent teams that continuously put their changes live without having to coordinate with each other. A wide range of different tests help us to deploy these changes quickly and without fear of errors. However, one class of tests has not yet gained the traction it deserves in the industry: consumer driven contracts, or CDCs. So I'd like to use our recent findings to write about it here.
For all types of data (product data, user information, purchases, discount promotions, etc.), we usually have one system that has sovereignty over it. If other systems also want to use this data, a copy of it is requested in the background, i.e. asynchronously, at a provided interface and automatically transferred to our own database. This way we can avoid long request cascades between systems. This is good for our response time to customer requests and very helpful in keeping the overall architecture resilient.
Up to now, the servers at our company have communicated almost exclusively via REST-like HTTP interfaces. So there is one system that provides data (the server) and one or more systems that are supposed to retrieve the data from there (the clients).
The server provides an HTTP endpoint - for example, for current product prices - and a client can make HTTP requests against it. Because the endpoints are generally protected against unauthorized access, keystores and credentials still belong in the picture, but essentially that's it:
The server's interface usually has some kind of documentation or specification and test coverage deemed appropriate by the server team. However, it still happens that dependencies on implementation details arise in the client. That's where teams rely on JSON elements always showing up in a certain order, or they can't handle new data fields. Naturally, everyone involved takes it upon themselves to be extra attentive. Or they are convinced that these kinds of mistakes only happen to others. Nevertheless, it happens again and again that something that worked before suddenly doesn't work anymore: a bug. This kind of glitch is also inconspicuous enough that it can be in the life system for quite some time before anyone notices it.
So I wish we would at least notice it quickly:
However, the test only runs when the client team's pipeline is triggered, for example by a code change. However, if the team is working on another service, an incompatible change could go live unnoticed by the server team.
So my wish list is actually a bit longer:
When the client team's test is run in the server team's pipeline, it's called a CDC test. The concept was popularized by an article by Martin Fowler in 2006, but then seems to have been forgotten a bit again.
This is a shame, because it effectively prevents a problematic interface change in the server from going live at all:
In the context of otto.de, the test was initially placed in the central artifactory, e.g. as a jar file, fetched from the server pipeline and executed there. Some teams provided shell scripts that, for example, determine and download the correct version of the jar.
A disadvantage of this approach is that runtime dependencies of the test must be present in the pipeline. For example, when we introduced Java8, it bothered us that some server pipelines were still running in Java6. In turn, other teams' tests rely on a Chrome binary or X11 libraries, which must then be pipelined by the server team. In response, some teams have wrapped their jar files in Docker images. This has reduced the problem somewhat, although of course the server team has to contend with the age-old Docker-Im-Docker problem, and version changes to Docker also tend to bring incompatible changes. For the same reason, an experiment with Pact was also over fairly quickly when one team was keen to introduce a newer version that didn't work with the other team's version.
It's also annoying that the server team has to have the client test credentials available in the pipeline, because credentials are of course not allowed to be in the code. So these credentials exist in parallel to the ones that the client team and the server team have to have in their environment anyway, in order to be able to authenticate and validate requests with them.
So while some wishes were fulfilled, my wish list got longer almost as fast:
Despite the shortcomings and in complete disregard of my wish list, this remained the state of our CDC tests for almost six years. It wasn't really, really good, but it was still good enough.
So, in order for the CDC tests to continue to work, the server team also had to partially drill holes in their firewalls. In addition, due to infrastructure as-code, the network path between client and server is now subject to potentially continuous changes: a new way to accidentally break clients.
So the server team now already has to ...
That was a good time to rethink the distribution of tasks.
The client team now deploys the test as a Lambda function, as an EC2 instance, or as part of their system. An Api gateway allows the server team to launch the test with an HTTP request. Since the test is delivered at the same time as the production code, it always has the correct version. The server team does not have to download anything and the only technical dependency is the ability to submit an HTTP request from the pipeline - which is not a challenge for any team thanks to Curl or WGet. The credentials for the test are in the keystore for the client anyway, and now finally all the wishes on the list have come true:
We have received your feedback.