navigation

Account

In your account you can view the status of your application, save incomplete applications and view current news and events

enEnglish

deGerman

June 22, 2020

A Data Mapping DSL

Development

This post is going to revolve around a domain specific language (DSL) which is part of one of our services. But instead of a detailed introduction to the language itself, I’m going to talk about the process we established around operating it and touch a bit on how parsing the DSL works. This seems far more interesting and the concepts I’m going to describe are more generally applicable than the language features and syntax itself. I am part of the the tracking team of otto.de: We provide APIs to collect tracking data - think web analytics - to other teams within in the company. The data is processed and enriched in our streaming data pipeline before being loaded into Adobe Analytics and other downstream systems.

Operating a DSL

One of those later processing steps involves transforming the data into a format which is accepted by the analytics tool. This is where the DSL comes into play, it allows users to define rules on how the input data is to be “mapped” into the output format. The input may contain deeply nested JSON structures, but the output needs to be in a flat form. Embedding this mapping as a simple DSL allows non-technical users to define how the data is transformed without having to write actual code. The language requires a lot of domain specific knowledge about the analytics tool and its intricacies, so it makes a lot of sense not to involve developers when the mapping rules need to be changed. Conversely having non-technical colleagues messing around with a production service might pose certain risks. To keep those risks in check, we established some safeguards as part of the process of changing the mapping rules:

The “rule” files containing the code written in the DSL are part of the service’s source code. As such they are versioned within our version control system (VCS). This makes changes very transparent: It is obvious what was changed when and by whom. In case of an error, rollbacks are as easy as doing $ git revert. The downside of this is having our (non-technical) users work with git, which poses its own challenges. An integrated IDE like IntelliJ lends a hand here, providing an approachable interface to the VCS.
Testing for syntactic and semantic errors is part of the service's CI/CD pipeline. This helps catching minor mistakes before the service is even deployed to the development environment. The mapping syntax can also be verified via a simple web form exposed on the service’s status page.
Mapping changes trigger a new deployment of the service. At this point we sacrifice a bit of velocity, but we can be confident in the current state of the system. There won’t be any bad surprises due to changes made during runtime. This has a similar effect as having the mapping files versioned in git: It’s transparent what’s currently running in production. Currently the rule files see a lot of changes, so this might be a show stopper in the near future. We are looking into ways to improve on that front.

Parsing a DSL

The DSL itself is syntactically similar to the YAML format, but is obviously more specific. For example the language provides helper functions for array and string manipulation and implements basic conditionals:

MerchandisingVars: 
  DefaultFunction: replace($mEvar, "|", "~")
  Vars: 
    var1: /access_path
    var6: ./topic_id
    var19: concat(findFirst(./assortment, /assortment), ./search_assortment, ";")

... 

Events: 
  event1: "1"if /breakpoint_change == "true"
  event2: "1"if /first_visit == "true"

As we are not a 100% on par with the official YAML specification, the mapping files are parsed by our own parser. The Scala parser-combinator library allows this to be implemented rather concisely: The actual parser is only about 200 lines of source code. A parser combinator approach encourages splitting the parser code into small, composable functions. The library provides the “plumbing” to stick those functions back together to build a powerful but comprehensible program. For example this is how the parser of the “replace” function looks like:

/**
  * Parses the replace function: 'replace(./foo, "ü", "ue")'  
  */
 def replaceFnParser: Parser[ReplaceFn] = { 
   "replace(" ~>
     (expressionParser <~ ",") ~
     (stringLiteralParser <~ ",") ~
     (stringLiteralParser <~ ")") ^^ { 
     case expression ~ substring ~ replacement => ReplaceFn(expression, substring, replacement)   
 
     } 
   }

The example shows how easily multiple small parsers, for expressions and string literals, are compiled together to form a larger more complex parser.We chose to use parser combinators because it is a good fit for our usual programming model: functional, readable and highly composable. Parser combinators are ahead of seemingly simpler solutions like regular expressions, which are neither easily composed
nor very readable. We specifically opted for the Scala parser combinator library, because it is a mature implementation, which even was part of the Scala standard library before becoming a community maintained library. Scala being the team’s primary programming language made it easy to integrate the library, there was no need for any intermediate representation format. The parser reads in the mapping files and parses them directly into Scala objects when the service starts up.

Once the service starts consuming input messages, those rules, now encoded in objects, are applied to the incoming messages by an interpreter. The interpreter encapsulates most of the services business logic in a component separate to the DSL parser, which again favours testability and maintainability. Zooming out farther, this interpretation is done in a single pipeline step in a kafka-streams based service.

Final thoughts

This gave a rough overview on how we integrate a simple DSL into our operations process. I also highlighted some of the advantages of parser combinators and how they fit into our programming model. There are of course other approaches to such problems. For example I could imagine a graphical interface, which lets users configure how data is to be mapped from one format to another. Though I doubt this would have been a sustainable solution for our backend-heavy team. We established a solid foundation with parser combinators and Scala, which we can now build and optimise upon.

0No comments yet.

Write a comment

Answer to: Reply directly to the topic

Written by

Franz Neubert

Former Software Engineer at OTTO

About the author

A Data Mapping DSL

Operating a DSL

Parsing a DSL

Final thoughts

0No comments yet.

Written by

Similar Articles

How we used a simple trick to save USD 500,000 in data transfer costs

Developer Hacks – Modern Command Line Tools and Advanced Git Commands

Your profile -
Your advantages

A people company.

Driven by technology.

We want to improve out content with your feedback.

Allow cookies?

Data uses