The software that handles transformation and data delivery at OTTO is called ProPHET (Product Data Partner Feed Handling & Export Tool). This article shows the advantages of switching to a domain-specific language (DSL) to describe the transformation of this data.
When exporting store data to price search engines, operators make precise specifications as to the format in which the product data must be submitted. For example, one operator wants the item price as a number in cents, while another wants it as a string with a comma and a euro sign. There are also very different specifications for specifying deliverability, shipping costs or images.
The software that handles transformation and data delivery at OTTO is called ProPHETProduct DataPartnerfeedHandlingExportTool). This article shows the benefits of moving to a domain-specific language (DSL) to describe the transformation of this data.
Previously, the workflow here was structured so that the preparation of the data was done in chains of functions. First, to pick up on the above example, a function was created that removed special characters. This function had an output attribute, which in turn could be used as an input attribute for the next function, which generated a decimal number (29.99) from a number (the eurocent, e.g. 2999). Another function generated a string from this according to German format ("29.99"). Then there was another function that appended the Euro sign ("29,99€").
The original design focused on the reusability of each step of this process. Each function generated an output attribute from an input attribute, which was also available to all other functions.
However, the system had a few weaknesses:
The third point could certainly have been countered for quite a while with more hardware. But clarity and usability are also fundamental requirements for a system, and so the team came up with the idea of giving the employees who are responsible for maintaining the exports to the individual partners a different approach.
Here is an example of function creation according to the old system. For each individual step in the transformation, a separate function must be created, saved, and then linked to the other functions using input and output attributes. Each of these actions in turn requires roundtrips to the server for saving and editing and picking the created attributes for the next step.
As a solution, the team proposed a domain-specific language (DSL) that would enable the department to transform the data in a clear editor. After initial preliminary considerations, we decided to present the idea to the department at an early stage in order to obtain feedback.
The initial skepticism there, when it came to using something like a programming language, we countered by pointing out that the things they were already doing there with Excel on a daily basis went far beyond simple programming. And we had no doubt that they could handle it.
After some preliminary considerations, the decision was made to implement it with Groovy, because custom parsers can be created very easily here and it integrates very well with the rest of the system, which works in the JVM.
In the frontend, CodeMirror is used, a flexible code editor written in JavaScript that can be easily configured and extended. It offers code completion, syntax highlighting and syntax errors can be easily marked in the edited code and provided with a meaningful error message.
For the computation of the functions these are first of all precompiled and bound for the execution at the respective product to a new context, in which all variables are available, which need them for the computation. Each DSL script returns exactly one result.
The DSL scripts now use the control structures that Groovy also already provides, such as if, else, etc. Except for string and arithmetic operations, all commands and libraries that are not required are blacklisted and thus not available in the first place. Further own DSL functions are defined, which are technically necessary for the transformation of the data, like GREATER_THAN, EXTRACT_ALL, SUBLIST, GET_LIST_ITEM, MATCHES, REPLACE , etc.. These are based on the notation of the Excel macro functions by name.
The functions written in the DSL are now passed an object for processing, in which all relevant information about a product is stored. The information can be accessed in the scripts normally via dot notation (Bspattr.var_descriptiontouse the description of the product).
The DSL functions within the script automatically use the result of the previous function as an input value to itself. Only in cases where this is not desired, either a constant is defined, or a value from the passed object with the product attributes is used. This increases the readability and clarifies the program flow.
CONCAT("Hello ", "World") // "Hello World" will be the current state REPLACE("Hello", "Moin") // "Moin World".
Within a script all necessary transformations are now performed to create the target attribute. Further intermediate steps are no longer necessary. For efficiency in processing, the Groovy shell object is reused and the functions are kept compiled. Thus, only the context must be passed and the result must be retrieved.
For example, a DSL script could look like this:
// Does the product have a name? if (NOT_EMPTY(attr.var_name)){ // "Testmarke® Notebook, Elektron™ // Replace all characters except word characters, whitespaces, periods and commas // The result of the REPLACE will be the current state REPLACE(attr.var_name, "[^\w\s,\.]", "") // "Testmarke Notebook, Elektron // Split the current state value at whitespaces into a list EXTRACT_ALL("[^\s]*") // "Testmarke|Notebook,|Elektron," // Take the first entry from the list GET_LIST_ITEM(FIRST) // Testmarke }
For both transparency of operations and flexibility, this represents a huge step forward. In addition, the language can grow along with the requirements of the business department on this foundation.
This changeover of the system prevents the proliferation of functions that are chained together. Currently, about one fifth of the attributes that were previously generated are still calculated. Correspondingly fewer have to be kept in the main memory and persisted in the database storage. Each transformation is now self-contained, which also virtually eliminates unforeseen dependencies. The employees have a convenient editor in which they can perform their transformation from start to finish and, with a preview function on any article, can immediately see the result, which enables comprehensive functional testability.
The enthusiasm of the specialist department during the presentation and briefing on the new DSL was then proof for us that the project was a success.
We have received your feedback.