Best Practices for DSLs and Model Driven Development
I finally took the time to read Markus Voelter‘s article Best Practices for DSLs and Model-Driven Development  in detail. Short conclusion: excellent article! A must read for everyone involved in the design and development of Domain-Specific Languages (DSLs) and Model Driven Development (MDD) tools.
This post provides a
short summary of the best practices listed in Markus’ article including some remarks from yours truly.
The best practices are grouped in three categories:
- Designing DSLs: best practices in language design.
- Processing models: best practices in making the models expressed with DSLs executable by either code generation or model interpretation.
- Process and organization: best practices for process and organization around MDD.
Sources for the language
For technical DSLs the source for the language is often an existing framework, library, architecture or architectural pattern. Building the DSL is mainly about formalizing the knowledge. In case of a business domain DSL the source for the language is the knowledge of the domain experts.
In my experience the principles of Domain-Driven Design are very useful in DSL design, especially when domain experts are the source for your language.
Do not create another Turing-complete, general purpose language. Try to focus the language on the "what" of a domain instead of the "how", i.e. make your languages declarative.
Notation, Notation, Notation
Notation, or concrete syntax, is extremely important when building DSLs! You will only be successful if you can tailor your notations to fit the domain. I prefer to see the structure of a Domain-Specific Language as variable along two different axes: the range of variation and the notation.
Graphical vs. Textual Notation
Editors for textual notations are much easier to build (more tool support) and textual models integrate more easily with existing source code management and build infrastructures. However, for certain kinds of information a graphical notation is better. Recommendation: if usable graphical editors are a lot of work to build, first stabilize the concepts and abstractions of the language with very simple editors.
In many systems different viewpoints use different notations. You can even mix the two forms of notations by using different DSL renderings or projections.
The meaning of a language, the semantics, is often defined in two ways:
- it is explained in prose and with examples towards the language users, and
- it is tied down towards the execution platform using the code generator or interpreter.
Identify the set of viewpoints relevant for describing the different concerns of a system, because:
- a software system usually cannot be described with one notation for all relevant aspects.
- the development process requires different aspects to be described by different roles at different times (clean separation of concerns).
I often call this way of modeling a system multi-modeling or multi-DSL development. As Markus points out the modularization of languages is similar to the modularization of software system, so the same rules apply: strong coherence internally, few interfaces externally and generally as little coupling as possible.
It is important to partition the overall model into separate "model units" to keep DSL editors and model processors scalable. Things to keep in mind in this context:
- which partition changes as a consequence of specific changes of the model (changing a name might require changes to all by-name references).
- where are links stored?
- how/where/when to control reference/link storage?
Language evolution is important in an MD* project! If you change the language, make sure that you also have a way of adapting model processors as well as existing models. Tools are important in this context.
My two recommendations on DSL evolution / maintenance (point 6 in the linked article):
- use a good DSL tool.
- implement models specified in different DSLs with different loosely coupled engines.
Markus also points at the importance of tools and the use of multiple DSLs. He formulates it nice: using a set of well-isolated viewpoint-specific DSLs prevents rippling effects on the overall model in case something changes in one DSL.
The fallacy of generic languages
Using predefined languages makes you spend most of your time thinking about how your domain concepts can be shoehorned into the existing language. In practice, in most cases it is much better to define your own DSL. However, don’t reinvent the wheel! If a suitable language exists, either use the existing language, or make sure you own implementation is compatible as far as possible. See also my article DSL in the context of UML and GPL for some remarks on this topic.
Learn from 3 GLs
To become a good DSL designer, it is useful to have broad knowledge about existing programming language paradigms. Four examples:
- Most languages need the notion of scoping.
- Specialization can also be applied to domain concepts.
- The notion of namespaces is found in many DLSs.
- Many DSLs contain the notion of instantiation.
Who are the first class citizens?
Markus points at two different styles of language design:
- one advocates big languages with first class support for many different domain concepts.
- the other advocates minimal languages with few but powerful primitive features, from which bigger features are constructed by combination.
Make sure your language design is consistent and ensure that the well-known concepts in a domain are first class.
In the context of this best practice I like the following quote:
It is good to have a simple language, but it is not good to sacrifice its expressiveness to the point where most of the time the programmer has to encode the concepts that he really needs indirectly with those available in the language.
– Andrej Bauer
Libraries are collections of instances of your DSL, intended for reuse. They can also be used to limit language complexity. The language can stay small, libraries contain bigger features constructed by combination (see previous point). In this way users can add / change all kind of things by changing the model instead of the language.
This approach can be compared to adaptive modeling or the knowledge level pattern in Domain-Driven Design (third alternative in the linked article).
DSL tooling should support all aspects (versioning, tagging, merging, etc.) of working collaboratively on models. Make sure the tools you use support all of these using the languages’ concrete syntax.
If you target developers with your DSLs you should make sure that the models and metaware interoperate with the rest of the development tools. In this case textual DSLs have a clear advantage.
If you target business / domain experts, repository-based systems are often capable in addressing these issues. For business users, pessimistic locking (and consequently no need for comparing and merging) might be easier to understand.
Good languages and notations are not enough, you have to provide good tool support for them too. The same holds for the "meta developers".
Interpretation vs. Code Generation
Most people tend towards code generation. However, interpretation is also a valid option.
The advantages of code generation:
- Perceived simpler because the generated code can be inspected.
- The templates can be extracted from manually code example applications.
- Generated code is easier to debug than an interpreter (you need to use conditional breakpoints all the time).
- Generated code can be tailored more closely to the task at hand, and hence can be smaller and/or more efficient.
- A code generator can work with any target platform/language. Code generation leaves no trace in the resulting system.
The advantages of interpretation:
- Changes in the model don’t require an explicit regeneration/rebuild/retest/redeploy step, significantly shortening the turnaround time.
- It is even possible for models to be changed from within the running application. See "Why MDSD isn’t fast enough and how to fix it" for more on this subject.
- Since no artifacts are generated, the build times can be much reduced.
- Depending on the specific case, an interpreter and the model can be smaller than generating code.
I agree with all points except for the debugging part. When using an interpreter you can set breakpoints in your model (as it is runtime available) making it easier to debug instead of more difficult. You also debug at model level instead of code level, making it easier to understand and fix the problem.
I think model interpretation / model execution / the use of engines has the following additional advantages:
- Models and generated artifacts (code) cannot be out of sync.
- An approach using model interpretation really abstracts away from the code level. You model, put that model in an engine, everything works as expected. No need to work with or know something about code (for domain experts this is an advantage, for programmers this can be seen as a disadvantages).
- Deployment can be made so much easier! Think Model-Execution-as-a-Service.
Rich Domain-Specific Platform
Do not generate unnecessary code. Work with a manually implemented, rich domain specific platform, which is used by the generated code.
See this article on MDE workbenches for a visualization of the position of a domain framework / platform in MDE.
Check first and separate
Check model constraints as early as possible and make them as domain-specific as possible. Check different constraints on different parts of the model at different times in the model processing chain.
Don’t modify generated code
Generated code should be a throw-away product. Add extension points into the generated code, using the composition features provided by your target language.
Control manually written code
If generated code changes, manual code can become erroneous. To solve this you can generate checks / constraints against the code base. You can also generate code that is never executed, but coerces the IDE into providing a quickfix.
Care about generated code
When integrating with generated code, you will have to read the generated code, understand it, and you will also have to debug it at some point. Hence, make sure generated code adheres to the same standards as manually written code.
Make the code true to the model
Make sure promises made by the model are kept by the code. For example, if all dependencies in your model are checked, don’t let manual code introduce dependencies that are not present in the model.
Viewpoints are introduced above as a best practice for designing DSLs. They are also important when processing models. For example, check constraints separately for different viewpoint models, use separate interpreters or code generators for different viewpoints (or mix them up), etc.
Note that if you fail to have separate generators (or interpreters / engines) per viewpoint, you introduce viewpoint dependencies "through the back door", effectively creating a monolith again.
In my opinion this is an important best practice. A while ago I wrote an extensive post about architecture requirements for Service-Oriented Business Applications (SOBA) as a basis for using separate engines for the different service types of a SOBA (as explained more literally here). It’s important to keep using such best practices from other software engineering disciplines (e.g. SOA, Domain-Driven Design) when following a model-driven paradigm.
Overall Configuration Viewpoint
It is a good idea to have a separate model that captures all the configuration, like what model parts to validate, generate code for, etc.
Care about templates
Code generation templates will be one of your central assets. Make sure you use well-known modularization techniques for them when the grow and become non-trivial. By generating code against meaningful frameworks (see "Rich Domain-Specific Platform" above), the overall amount of template code required is reduced.
M2M transformations to simplify generators
Code generators can be simplified by using intermediate model-to-model transformations, i.e. transform a model into a more detailed model (you can repeat this) and transform that model into code.
M2M transformations for simulation
Model-to-model transformations can also be used to transform Domain-Specific Languages into a model language for which suitable validation or proofing exist.
Allow for adaptations
MDD benefits from the economies of scale. You will benefit a lot when you can reuse a DSL (and its associated generator / interpreter) multiple times. While reuse is hard, make sure you provide means for implementing unexpected variability in a non-invasive way.
If you want to use multiple model layers (like in the Model Driven Architecture) you should start from the bottom. First define a DSL that resembles your system’s software architecture. The abstractions used in the DSL are architectural concepts of your target architecture. In subsequent steps, build on top of that stable basis abstractions that are more business-domain specific.
If you need to add information to a model created by a model-to-model transformation before it is processed by the next step you should use an annotation model, do never change a generated artifact.
There’s a tendency to use action semantic languages (ASLs) to describe system behavior on model level. However, the abstraction level of ASLs is not fundamentally different from a 3GL. On the plus side: action languages stay on the model level and can be part of model refactoring and validation.
Implementing behavior can become more efficient:
- Classify behavior into different kinds (e.g. state-based, business-rule based) and provide specific DSLs for those classes (resulting in multi-DSL development).
- Use business domain specific DSLs for suitable classes of behavior.
Don’t forget testing
Constraint checks are a form of test. Test the generator using a test model that covers all features of the language and write test against this model. Assuming the generator is well tested and mature there’s no need to verify the generated code. However, you should write tests to make sure the model semantics is as expected.
A while ago I wrote an article on quality in Model Driven Development covering model validation, model checking, and model-based testing.
Process and Organization
Don’t do waterfall again when using MDD in your project. You need to iterate when developing the meteware, i.e. build a little bit of language, a little bit of generator, and develop a small example model to verify what you just did.
Co-evolve concepts and language
In cases where you do a real domain analysis, you have to find out which concepts the language (i.e. DSL) should contain. Make sure you evolve the language in real time as you discuss the concepts.
See also my article about designing DSLs using the concepts of Domain-Driven Design.
Documentation is still necessary
To be successful with MDD you have to communicate to the users how the DSL and the processors work. You have to document:
- the language structure and syntax.
- how to use the editors and generators.
- how and where to write manual code and how to integrate it.
- platform / framework decisions (if applicable).
I really want to emphasize this best practice. I see a lot of initiatives in the field of MDD, but most are very technology-oriented. Markus adds a very useful tip: as hardly anybody reads reference documentation, make sure the majority of your documentation is example-driven or task-based.
Make sure you do regular model reviews. Recurring mistakes should be covered by a constraints check or should lead to language changes.
I use this often in practice, for me reviews are a very important mechanism to provide input for the language design process.
Let people do what they are good at
MDD offers a chance to let everybody do what they are good at. Markus gives two examples. For a full overview of the possible roles in MDD see my article "Roles in Model Driven Engineering".
Domain Users Programming?
If your language is suitable for the domain, domain user can ‘program’ alone. However, instead of expecting domain users to independently specify domain knowledge, you might want to pair a developer and a domain expert. If that doesn’t work either you can at least communicate using the models.
Domain Users vs. Domain Experts
When building business DSLs, people from the domain can play two different roles:
- Domain experts can participate in the domain analysis and the definition of the DSL itself.
- Domain users can use the DSL to express specific domain knowledge. They are typically not as experienced as domain experts.
Make sure you actually cross-check with real domain users whether they are able to work with the language.
Metaware as a product
Usually there’s a split in roles in MDD. The meta-team develops the metaware. The project team uses the metaware in actual projects. To make this work you should consider the metaware as a product, i.e. well-defined release schedules, tracking of requirements and issues, errors are fixed quickly, documentation, support staff, etc.
Markus formulates a specific best practice (which can be difficult in practice): exchange people between these teams.
Make sure that the organizational structure, and the way project cost is handled, is compatible with cross-cutting activities. In a strictly project-focused organization it can be very difficult to find resources for developing the metaware.
Forget Published Case Studies
The only real way to find out whether DSLs and MDD are good for you is to do a prototype.
Although I have provided a lengthy summary I think you will benefit a lot from reading the full article!
 Markus Voelter: "Best Practices for DSLs and Model-Driven Development", in Journal of Object Technology, vol. 8, no. 6, September-Oktober 2009, pp. 79 – 102 http://www.jot.fm/issues/issue_2009_09/column6/