“Domain Specific Languages” by Martin Fowler

In 2011, Martin Fowler wrote “Domain Specific Languages” published by Addison-Wesley. Examples of domain specific languages you might have encountered before are regular expression specifications, Makefiles, Direct3D’s high level shader language (HLSL), OpenGL’s GL shader language (GLSL), the Wavefront object file format (.obj) or input specifications to the compiler tools lex (.l) and yacc (.y). Fowler defines a domain specific language as “a computer programming language of limited expressiveness focused on a particular domain.” Domain specific languages (DSLs) have been around for a long time, but to date there hasn’t been any general treatment of the techniques and characteristics of DSLs in general, as opposed to the traits of a particular DSL. Fowler’s book is a good first entry.

The book is organized into six parts over 56 chapters: Narratives, Common Topics, External DSL Topics, Internal DSL Topics, Computational Models and Code Generation. The chapters are titled as listed below. The book clocks in at roughly 600 pages, including the index, front matter, etc. With 56 chapters, it may sound like somewhat of a tome, but most of the book is organized as a reference and each chapter expositing a particular technique used to implement a DSL.

Part I: Narratives
1. An Introductory Example
2. Using Domain-Specific Languages
3. Implementing DSLs
4. Implementing an Internal DSL
5. Implementing an External DSL
6. Choosing Between Internal and External DSLs
7. Alternative Computational Models
8. Code Generation
9. Language Workbenches
Part II: Common Topics
10. A Zoo of DSLs
11. Semantic Model
12. Symbol Table
13. Context Variable
14. Construction Builder
15. Macro
16. Notification
Part III: External DSL Topics
17. Delimiter-Directed Translation
18. Syntax-Directed Translation
19. BNF
20. Regex Table Lexer (by Rebecca Parsons)
21. Recusive Descent Parser (by Rebecca Parsons)
22. Parser Combinator (by Rebecca Parsons)
23. Parser Generator
24. Tree Construction
25. Embedded Translation
26. Embedded Interpretation
27. Foreign Code
28. Alternative Tokenization
29. Nested Operator Expression
30. Newline Separators
31. External DSL Miscellany
Part IV: Internal DSL Topics
32. Expression Builder
33. Function Sequence
34. Nested Function
35. Method Chaining
36. Object Scoping
37. Closure
38. Nested Closure
39. Literal List
40. Literal Map
41. Dynamic Reception
42. Annotation
43. Parse Tree Manipulation
44. Class Symbol Table
45. Textual Polishing
46. Literal Extension
Part V: Alternative Computational Models
47. Adaptive Model
48. Decision Table
49. Dependency Network
50. Production Rule System
51. State Machine
Part VI: Code Generation
52. Transformer Generation
53. Templated Generation
54. Embedment Helper
55. Model-Aware Generation
56. Model Ignorant Generation
57. Generation Gap

Fowler subdivides the world of DSLs into internal and external. An internal DSL is one implemented within the syntactical constructs of some other language, called the host language. In my blog post on description helpers for Direct3D, I showed how to create Direct3D resources using method chaining. You could think of this technique as a small internal DSL hosted in C++ for creating Direct3D resources. An external DSL is one implemented entirely outside of a particular programming language. It will often consist of a lexical analyzer, parser and result in some sort of generated code when the DSL is processed. This is typically what most people think of when they think of a domain specific language, often with the lexer implemented using lex and the parser implemented using yacc.

So why would you go to the trouble of implementing a DSL? In many cases, it can be easier to express elements of the problem domain more succinctly in a DSL than in a general purpose programming language. This is similar to the way that certain problems in mathematics or physics become “trivial” once the problem has been converted to a different domain. For instance, filtering certain frequency content of time-varying signals is trivial in frequency domain and is the reason why the Fourier transform is so popular. Filtering these signals in the time domain requires a complicated convolution operation. You transform the signal into frequency domain, perform the filtering operation and transform back into the time domain. Similarly, with some problems in computing its simpler to conceive of the solution in a domain specific language and use that language to drive the computation of the solution. Regular expressions are an example of such a DSL that makes certain computing tasks easier. Its much easier to match a complex input against a regular expression than it is to write your own state machine that handles the complexity of the input.

Fowler does a good job of covering these topics, but there is one hotbed of domain specific language activity that he misses completely: C++ and template metaprogramming. Fowler admits his disdain for C++ in the book, but this only places more responsibility on him to consult with someone from the C++ community to contribute something on DSLs. Instead, he skips over the DSL work going on in C++ entirely. This is a glaring omission and probably the weakest element of the book. Consider this example of a JSON parser written by Colin Rundel using Boost.Spirit, an internal DSL for creating recursive descent parsers:

    1 #include <iostream>
    2 #include <vector>
    3 #include <map>
    4 #include <utility>
    5 
    6 #include <boost/mpl/print.hpp>
    7 #include <boost/spirit/include/qi.hpp>
    8 #include <boost/spirit/include/karma.hpp>
    9 #include <boost/fusion/include/std_pair.hpp> 
   10 #include <boost/fusion/include/adapt_struct.hpp> 
   11 #include <boost/fusion/container/map.hpp> 
   12 #include <boost/variant/recursive_variant.hpp>
   13 
   14 namespace karma = boost::spirit::karma;
   15 namespace qi = boost::spirit::qi;
   16 namespace ascii = boost::spirit::ascii;
   17 namespace phoenix = boost::phoenix;
   18 
   19 namespace json {
   20     struct object;
   21     struct array;
   22 
   23     typedef boost::variant< 
   24         boost::recursive_wrapper<object>, 
   25         boost::recursive_wrapper<array>, 
   26         std::string, double, int, bool > value;
   27 
   28     typedef std::pair< std::string, value > keyval;
   29     struct object : std::map< std::string, value> {};
   30 }
   31 
   32 BOOST_FUSION_ADAPT_STRUCT(
   33     json::keyval,
   34     (std::string, first)
   35     (json::value, second)
   36 );
   37 
   38 namespace json {
   39     struct array : std::vector<value>{};
   40 
   41     template< typename Iter > 
   42     struct json_grammar : qi::grammar< Iter, value(), ascii::space_type > {
   43         json_grammar() : json_grammar::base_type(start) {
   44             using qi::lit;
   45             using qi::lexeme;
   46             using qi::double_;
   47             using qi::int_;
   48             using qi::bool_;
   49             using ascii::char_;
   50 
   51             quoted_string = lexeme['"' >> +(char_ - '"') >> '"'];
   52             object_rule   = lit('{') >> pair_rule % ',' >> '}';
   53             pair_rule     = quoted_string >> ':' >> value_rule;
   54             array_rule    = lit('[') >> value_rule % ',' >> ']';
   55             value_rule    = object_rule | array_rule | quoted_string
   56                             | double_ | int_ | bool_ ;
   57             start         = value_rule;
   58         }
   59 
   60         qi::rule< Iter, value(),        ascii::space_type > start;
   61         qi::rule< Iter, std::string(),  ascii::space_type > quoted_string;
   62         qi::rule< Iter, value(),        ascii::space_type > value_rule;
   63         qi::rule< Iter, object(),       ascii::space_type > object_rule;
   64         qi::rule< Iter, keyval(),       ascii::space_type > pair_rule;
   65         qi::rule< Iter, array(),        ascii::space_type > array_rule;
   66     };
   67 }
   68 
   69 int main(int argc, char **argv) {
   70     using namespace std;
   71     string text( "{\"test\" : 123, \"test2\": {\"key\" : \"hello\"}}"); 
   72     json::json_grammar< string::const_iterator > parse_grammar;
   73     json::value json_tree;
   74     string::const_iterator beg = text.begin();
   75     string::const_iterator end = text.end();
   76     bool r = qi::phrase_parse(beg, end, parse_grammar, ascii::space, json_tree);
   77 
   78     if (!r) {
   79         cout << "-------------------------\n";
   80         cout << "Parse Failed!\n";
   81         cout << "-------------------------\n";
   82         return(1); 
   83     }
   84     cout << "-------------------------\n";
   85     cout << "Parse Succeeded!\n";
   86     cout << "-------------------------\n";
   87 
   88     return(0);
   89 }

At just around 100 lines, including simple test harness, this is a very concise JSON parser and yet it is also quite complete. The power of template metaprogramming in C++ is what makes Spirit and other internal DSLs possible. This is where all the exciting developments of C++ have been occurring, but because Fowler stopped paying attention to C++ about 10 years ago, he completely misses the party.

Aside from this one glaring omission, Fowler does a good job of explaining many elements of domain specific language implementation. He has lots of good advice for things you should do and when. Each chapter in Parts II-VI is organized as a pattern catalog in a fashion similar to the chapters of his book “Refactoring: Improving the Design of Existing Code”. There is a short summary of the pattern described, a code snippet or object diagram illustrating the pattern, a discussion of how the pattern works and when to use it and finally an example of the pattern in use. Most of the examples are in Java or C#, with the occasional example in other languages where appropriate.

Occasionally, Fowler digresses into a meta-discussion of the book itself. While I feel this is appropriate for a book’s preface, I don’t feel it is appropriate for the main body of the book. These digressions are short and don’t significantly detract from the primary content of the book.

Overall I would recommend this book to anyone that is interested in learning more about techniques for implementing a domain specific language.

3 Responses to ““Domain Specific Languages” by Martin Fowler”

  1. Seth Says:

    Had to fix up the example in order to get it to compile:

    Change

        struct keyval : std::pair {};
        struct object : std::map {};
    

    into

        struct object : std::map {};
        typedef std::pair keyval;
    

    Like

    • legalize Says:

      Thanks! I updated the post to credit the original author (Colin Rundel) and so that it compiled with boost 1.42 and VC++ 2005. Boost.Spirit has undergone a number of bug fixes since 1.42, so if you intend to use Boost.Spirit for current development I would recommend the latest boost release.

      Like

  2. Seth Says:

    Oh hey. I just found this post – again. I happen to have written a more complete attempt at JSON parsing from the RFC text recently. This one _does_ handle unicode, null and escapes. https://github.com/sehe/spirit-v2-json/blob/master/JSON.hpp

    I see that Rundel’s version had some elegant tweaks (`struct array : std::vector{};`, e.g.) that I’ll be sure to remember for when I revisit it. Sometime…

    Like


Leave a comment