navigation

Account

In your account you can view the status of your application, save incomplete applications and view current news and events

enEnglish

deGerman

June 23, 2015

A tale of two lambdas: The joys of working in a polyglot team.

Architecture

Development

4 Comments

0 Likes

What is the article about?

As for the issue of the monolithic software, a lot has changed in quite a short period of time. We moved from rather big vertically decomposed systems to micro-verticals and microservices. See for example Guido Steinacker's post on that

A little more than a year ago, things at work were completely different for me than they are now. I was a Java programmer working on a big, monolithic piece of software. There was also some JavaScript for frontend and MongoDB, some Groovy for scripting Gradle and the occasional Bash script, but if you had called me a 100% Java programmer, I would not have objected. Don't get me wrong, I enjoyed my job and I always loved to work with Java but as you may relate to, after years of hacking Java, I was a little tired of it. As you may also relate to, I had little hope for the situation to change significantly anytime soon. As it turned out, I was wrong about that. Hugely wrong.

Poly...what?

A lot has been written about polyglot programming. I did not read much of it. So forgive me if I miss something obvious. For me it is simply defined though: Polyglot programming is when a team of developers works with a set of different programming languages.

To understand my team's approach I will have to take you on a little digression on architecture first.

The fall of the monoliths.

Also, we are now creating a new and much more flexible Infrastructure based on Mesos and Docker. Simon Monecke has written a post about that and about how we deploy to that environment with LambdaCD. This post is only the first of a series of three.

Our new Mesos-based infrastructure is not only a suitable runtime for our classical web applications (micro or not), but also for Apache Spark. Which brings us to the first of the two lambdas I promised.

The Lambda Architecture.

Wikipedia says:

"Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch- and stream-processing methods. This approach to architecture attempts to balance latency, throughput, and fault-tolerance by using batch processing to provide comprehensive and accurate views of batch data, while simultaneously using real-time stream processing to provide views of online data."

In our team, which was founded in early 2015, we work on such an architecture. Figure 1 shows a typical lambda architecture with:

A source of streaming data
A source of batch data
A processing unit for the streaming data
A processing unit for the batch data
Shared storage for batch and stream processing
A cache for the batch results
A cache for the streaming results
A facade for accessing the data from the caches

A typical lambda architecture. See text for labeling of the numbers.
A typical lambda architecture. See text for labeling of the numbers.

Early Lambda architecture implementations often suffer from using different technologies on the two streams of data. The batch processing is usually conducted using Hadoop Map-Reduce which is fundamentally unsuitable for stream processing. Thus the streaming part has to be implemented in a different technology. While this offers the advantage of allowing you to choose different solutions for different problems, it can also make you quite ineffecient if the data you process in batch and streaming is essentially the same.

Apache Spark is there to close that gap. It provides a sane API and a huge base of data processing libraries (e.g. MLlib) that can be used for both batch and stream processing. Spark features immutable, distributed data structures and tries to keep as much of the data in memory as possible. Spark programs can be treated as microservices in their own right. You end up with a bunch of different jobs which can be developed, deployed, scheduled, scaled and run independently.

Spark is written in Scala, which is also the native language in which to program Spark. This is why Scala is now part of our polyglot portfolio. Our team contains not only developers, but also data scientists with a tradition of prototyping and developing models with Python. Luckily, there also exists a Python API for Spark.

Microservices.

The Lambda architecture is a very powerful pattern, but to build and maintain a production system with it means you have a lot of requirements for which Spark is not necessarily the best fit. Instead we use microservices to satisfy requirements like:

The facade service itself (A)
Dashboards for business users (B)
Import and transformation of additional data sources (C)
Technical and functional monitoring
Dashboards for operations

For the sake of simplicity, the figure only sketches the first three services. We implement these services with Clojure. Which brings us to the second Lambda.

The Lambda calculus.

Clojure is a Lisp, which is in turn based on the Lambda Calculus. Lisp is a programming language that dates as far back as 1958. The Lambda Calculus is even older and was introduced in 1936.

Our services share a minimal framework. We named it like we named our team after Nikola Teslatesla-microserviceAs other teams at OTTO are now also using our framework we published the source code on github Read on for details about Clojure as a programming language.

Wait. Isn't that three Lambdas now?

Yes. Interestingly, you are right. As mentioned above, we are currently replacing our Jenkins-CI Server with pipelines driven by LambdaCD A continuous delivery pipeline in code. Clojure code, that is. It feels amazingly good to have pipelines that can be executed on any machine and that are also unit-tested. But this is not the topic now. I heard rumour that the follow up on Simon's article will contain real code samples and will be released as soon as next week.

Polyglot persistence.

polyglot-persistance
The different technologies we use make the architectural diagram quite colorful.

If you take everything together, you get the picture in Figure 3. Not in the picture are Graphite for metrics Elasticsearch for logs and ZooKeeper for configuration. It might appear a little messy at first sight, but it isn't. It is a straightforward, data-centric architecture. It uses the most pragmatic tools for any given task. All components are easily interchangeable. Last but not least: It is a lot of fun to work with.

I fear I may have bored you with all that contextual information, so lets get back on topic: the programming languages.

Scala.

Scala really is a great language. It has tons of interesting features. From a Java programmer's perspective, Scala feels like two huge steps forward. Data structures are immutable. Functional programming is well supported. There is pattern matching and countless other cool features. All that makes it very interesting and fun to work with.

At the same time it feels like a little step back. Scala is complicated. All too often a pair of developers will look at each other and say "Did we really just spend two hours getting the types right when converting that data representation into that other one?" Also, it will frequently give you a hard time trying to understand what some library code does. Chances are the authors of the library fancy a different set of language features and syntax options than you do.

I would not call myself an expert on Scala, so I will illustrate my feelings about it with three analogies:

Scala is like C++:Powerful and Multifaceted. It is a very powerful language giving you the freedom to solve problems in a lot of different ways.
Scala is like Haskell: Academic standards. Being subject of a lot of scientists' work results in well thought out (but possibly a little too academic) features like the powerful type system.
Scala is like Gentoo Linux: A research lab for the best solution. Like Gentoo, Scala has a rather steep learning curve. Like Gentoo, Scala is the playground of a lot of language nerds that are in search for the best possible solutions. Sooner or later everybody profits from that work, as the results migrate into the more user-friendly distributions. So if you are not that nerd, you do not have to use it.

Python.

Python is a nice and friendly little language that has been around for quite a while now. By using indentation instead of parentheses, it removes a lot of visual bloat. It is friendly to beginners and it is also understood by many, many techies that would not call themselves a software developer. I just asked one of our data scientists, Stephan, what he thinks about Python and he said:

"Python not only enables me to solve almost all of my computational problems but it is also, unlike R, a lingua franca to communicate with computer scientists."

That's what it is. It is a language equally well suited to support the rather exploratory work of a data scientists and to fulfill all the requirements of reliably operating production software. The support for Python in Spark makes it a first class language for us to build prototypes and to bring prototypes to production with little or no conversion overhead. I'm not sure we will ever find performance problems that are bad enough to require a migration to Scala, but if so, than that's what we will do.

Clojure.

Clojure It is by far our favourite language. Clojure has the intent to make simple easy. And it does. Listening to a pair say "Oh, I think we are done." is often followed by the disbelieving "But that was only five minutes!" And then they move on to the next task.

As a Lisp, Clojure features a syntax that is fundamentally different from everything most Java, Scala and Python programmers are accustomed to. This can be perceived as a hurdle when migrating to Clojure. In our experience it is only a low hurdle though. Most developers, when having crossed that bridge, are excited by the simplicity, brevity and expressiveness of Clojure code.

What is particularly puzzling on very first sight is the different use of parentheses. A pair of parentheses defines a list. That list can be either data or code. The first symbol in the list is treated as a function then, all the following are parameters. Sounds complicated? It Isn't. Here are some examples for how to translate code:

println("foo")becomes(println "foo"). 1 + 2becomes(+ 1 2)

The slightly more complicated Formula1 + 3 * 2becomes(+1 (* 3 2))which demonstrates the simplicity of the syntax: As the order of execution is clear, mathematical operators do not have to be complected with precedence rules. (Yes to complect is a word).

From my own experience and from that of many of my colleagues I can now confidently say: It is only Syntax. You get used to it quickly. Looking at the code of a library will frequently surprise you, too, but the good way. You will be surprised by how little code is actually necessary to do the trick.

There is no explicit type system in Clojure (although an optional one exists) and it turns out you don't need it. Clojure is data-centric instead. With immutability as the default, clojure offers a Software Transactional Memory to manage mutable state. This makes it much, much harder to shoot your own foot with shared mutable state problems. Also, it has very cool and simple concurrency features built in. Beeing homoiconic, macros can be written in the language itself which makes Clojure easily extendable. There are extensions for logic programming, matrix computation, pattern matching and many more.

I could go on with this list, but I will finish here. Just one more thing: All this goodness comes as a JVM-language. So the complete ecosystem of the Java world is only one import away (just like in Scala).

Should I try polyglot programming then?

Should you try polyglot programming? Yes you should. But isn't that inefficient? No it is not.

Let me say it with another analogy.

Polyglot programming is a lot like pair programming: If you have not tried it, it is intuitive to assume it means a lot of overhead and is of doubtable benefit. But as in pair programming the overall productiveness of a team is not diminished. Quite the contrary is true:

As in pair programming looking at a problem from different perspectives makes you understand the problem better.
As in pair programming that does not mean the most perfect, but the most pragmatic solution will be sought for.
As in pair programming fiddling with the tools becomes less important and the business problems that need to be solved move into focus.
As in pair programming it takes a little while to get used to it, but then you do not usually want to go back.
As in pair programming polyglot programming makes work a lot more fun.

So go for it.

What's next?

There is always new stuff to try. Like I mentioned there is persistence technologies like Cassandra and Datomic. There is a lot of programming languages yet to be tried. And as soon as we feel a little safer with Spark and understand it better than we do now we will look at Clojure-APIs for it like e.g Sparkling.

Last year I visited the EuroClojure conference in the beautiful city of Kraków. It was by far the most useful conference I have yet visited. (see my reports of day one and day two This very week I will go to Barcelona where this year's EuroClojure takes place. This time I won't go alone, but take a few colleagues with me. I am really looking forward to that. See you there.

4Comments

Pavel Švec
24.06.2015 09:39 Clock
Great piece, Christian. I hesitate to enter one more language as I don't feel absolutely fluent in those I currently use. There is still so much to learn in Python, Django and libraries. When do you stop with one and skip to another?
Christian Stamm
24.06.2015 09:59 Clock
Hi Pavel, thanks for the feedback.

As for the stopping: The only language I stopped using is Java. That was after years of working with it. I do not miss it.

For Your current situation I think trying a different language can only improve your python skills as it will deepen Your understanding of programming languages in general.
Polyglot Lambda Architecture | Tales from a Trading Desk
29.06.2015 10:58 Clock
[…] has a number of great postings on the work they are doing. One of the most recent articles is “A tale of two lambdas: The joys of working in a polyglot team.” which speaks […]
Links & Reads from 2015 Week 26 | Martin's Weekly Curations
30.06.2015 16:40 Clock
[…] A tale of two lambdas: The joys of working in a polyglot team. […]

Write a comment

Answer to: Reply directly to the topic

Written by

Christian Stamm

We want to improve out content with your feedback.

How interesting is this blogpost?

We have received your feedback.

Allow cookies?

OTTO and four partners need your consent (click on "OK") for individual data uses in order to store and/or retrieve information on your device (IP address, user ID, browser information).
Data is used for personalized ads and content, ad and content measurement, and to gain insights about target groups and product development. More information on consent can be found here at any time. You can refuse your consent at any time by clicking on the link "refuse cookies".

Data uses

OTTO works with partners who also process data retrieved from your end device (tracking data) for their own purposes (e.g. profiling) / for the purposes of third parties. Against this background, not only the collection of tracking data, but also its further processing by these providers requires consent. The tracking data will only be collected when you click on the "OK" button in the banner on otto.de. The partners are the following companies:
Google Ireland Limited, Meta Platforms Ireland Limited, LinkedIn Ireland Unlimited Company, TikTok Information Technologies UK Limited
For more information on the data processing by these partners, please see the privacy policy at otto.de/jobs. The information can also be accessed via a link in the banner.
You can also withdraw your consent at any time without giving any reason by clicking on the button 'Cookie Settings' in the footer of the website and 'Refuse Cookies'.

more information