In 2011, Martin Fowler wrote “Domain Specific Languages” published by Addison-Wesley. Examples of domain specific languages you might have encountered before are regular expression specifications, Makefiles, Direct3D’s high level shader language (HLSL), OpenGL’s GL shader language (GLSL), the Wavefront object file format (.obj
) or input specifications to the compiler tools lex (.l
) and yacc (.y
). Fowler defines a domain specific language as “a computer programming language of limited expressiveness focused on a particular domain.” Domain specific languages (DSLs) have been around for a long time, but to date there hasn’t been any general treatment of the techniques and characteristics of DSLs in general, as opposed to the traits of a particular DSL. Fowler’s book is a good first entry.
The book is organized into six parts over 56 chapters: Narratives, Common Topics, External DSL Topics, Internal DSL Topics, Computational Models and Code Generation. The chapters are titled as listed below. The book clocks in at roughly 600 pages, including the index, front matter, etc. With 56 chapters, it may sound like somewhat of a tome, but most of the book is organized as a reference and each chapter expositing a particular technique used to implement a DSL.
Part I: Narratives |
1. An Introductory Example |
2. Using Domain-Specific Languages |
3. Implementing DSLs |
4. Implementing an Internal DSL |
5. Implementing an External DSL |
6. Choosing Between Internal and External DSLs |
7. Alternative Computational Models |
8. Code Generation |
9. Language Workbenches |
Part II: Common Topics |
10. A Zoo of DSLs |
11. Semantic Model |
12. Symbol Table |
13. Context Variable |
14. Construction Builder |
15. Macro |
16. Notification |
Part III: External DSL Topics |
17. Delimiter-Directed Translation |
18. Syntax-Directed Translation |
19. BNF |
20. Regex Table Lexer (by Rebecca Parsons) |
21. Recusive Descent Parser (by Rebecca Parsons) |
22. Parser Combinator (by Rebecca Parsons) |
23. Parser Generator |
24. Tree Construction |
25. Embedded Translation |
26. Embedded Interpretation |
27. Foreign Code |
28. Alternative Tokenization |
29. Nested Operator Expression |
30. Newline Separators |
31. External DSL Miscellany |
Part IV: Internal DSL Topics |
32. Expression Builder |
33. Function Sequence |
34. Nested Function |
35. Method Chaining |
36. Object Scoping |
37. Closure |
38. Nested Closure |
39. Literal List |
40. Literal Map |
41. Dynamic Reception |
42. Annotation |
43. Parse Tree Manipulation |
44. Class Symbol Table |
45. Textual Polishing |
46. Literal Extension |
Part V: Alternative Computational Models |
47. Adaptive Model |
48. Decision Table |
49. Dependency Network |
50. Production Rule System |
51. State Machine |
Part VI: Code Generation |
52. Transformer Generation |
53. Templated Generation |
54. Embedment Helper |
55. Model-Aware Generation |
56. Model Ignorant Generation |
57. Generation Gap |
Fowler subdivides the world of DSLs into internal and external. An internal DSL is one implemented within the syntactical constructs of some other language, called the host language. In my blog post on description helpers for Direct3D, I showed how to create Direct3D resources using method chaining. You could think of this technique as a small internal DSL hosted in C++ for creating Direct3D resources. An external DSL is one implemented entirely outside of a particular programming language. It will often consist of a lexical analyzer, parser and result in some sort of generated code when the DSL is processed. This is typically what most people think of when they think of a domain specific language, often with the lexer implemented using lex
and the parser implemented using yacc
.
So why would you go to the trouble of implementing a DSL? In many cases, it can be easier to express elements of the problem domain more succinctly in a DSL than in a general purpose programming language. This is similar to the way that certain problems in mathematics or physics become “trivial” once the problem has been converted to a different domain. For instance, filtering certain frequency content of time-varying signals is trivial in frequency domain and is the reason why the Fourier transform is so popular. Filtering these signals in the time domain requires a complicated convolution operation. You transform the signal into frequency domain, perform the filtering operation and transform back into the time domain. Similarly, with some problems in computing its simpler to conceive of the solution in a domain specific language and use that language to drive the computation of the solution. Regular expressions are an example of such a DSL that makes certain computing tasks easier. Its much easier to match a complex input against a regular expression than it is to write your own state machine that handles the complexity of the input.
Fowler does a good job of covering these topics, but there is one hotbed of domain specific language activity that he misses completely: C++ and template metaprogramming. Fowler admits his disdain for C++ in the book, but this only places more responsibility on him to consult with someone from the C++ community to contribute something on DSLs. Instead, he skips over the DSL work going on in C++ entirely. This is a glaring omission and probably the weakest element of the book. Consider this example of a JSON parser written by Colin Rundel using Boost.Spirit, an internal DSL for creating recursive descent parsers:
1 #include <iostream>
2 #include <vector>
3 #include <map>
4 #include <utility>
5
6 #include <boost/mpl/print.hpp>
7 #include <boost/spirit/include/qi.hpp>
8 #include <boost/spirit/include/karma.hpp>
9 #include <boost/fusion/include/std_pair.hpp>
10 #include <boost/fusion/include/adapt_struct.hpp>
11 #include <boost/fusion/container/map.hpp>
12 #include <boost/variant/recursive_variant.hpp>
13
14 namespace karma = boost::spirit::karma;
15 namespace qi = boost::spirit::qi;
16 namespace ascii = boost::spirit::ascii;
17 namespace phoenix = boost::phoenix;
18
19 namespace json {
20 struct object;
21 struct array;
22
23 typedef boost::variant<
24 boost::recursive_wrapper<object>,
25 boost::recursive_wrapper<array>,
26 std::string, double, int, bool > value;
27
28 typedef std::pair< std::string, value > keyval;
29 struct object : std::map< std::string, value> {};
30 }
31
32 BOOST_FUSION_ADAPT_STRUCT(
33 json::keyval,
34 (std::string, first)
35 (json::value, second)
36 );
37
38 namespace json {
39 struct array : std::vector<value>{};
40
41 template< typename Iter >
42 struct json_grammar : qi::grammar< Iter, value(), ascii::space_type > {
43 json_grammar() : json_grammar::base_type(start) {
44 using qi::lit;
45 using qi::lexeme;
46 using qi::double_;
47 using qi::int_;
48 using qi::bool_;
49 using ascii::char_;
50
51 quoted_string = lexeme['"' >> +(char_ - '"') >> '"'];
52 object_rule = lit('{') >> pair_rule % ',' >> '}';
53 pair_rule = quoted_string >> ':' >> value_rule;
54 array_rule = lit('[') >> value_rule % ',' >> ']';
55 value_rule = object_rule | array_rule | quoted_string
56 | double_ | int_ | bool_ ;
57 start = value_rule;
58 }
59
60 qi::rule< Iter, value(), ascii::space_type > start;
61 qi::rule< Iter, std::string(), ascii::space_type > quoted_string;
62 qi::rule< Iter, value(), ascii::space_type > value_rule;
63 qi::rule< Iter, object(), ascii::space_type > object_rule;
64 qi::rule< Iter, keyval(), ascii::space_type > pair_rule;
65 qi::rule< Iter, array(), ascii::space_type > array_rule;
66 };
67 }
68
69 int main(int argc, char **argv) {
70 using namespace std;
71 string text( "{\"test\" : 123, \"test2\": {\"key\" : \"hello\"}}");
72 json::json_grammar< string::const_iterator > parse_grammar;
73 json::value json_tree;
74 string::const_iterator beg = text.begin();
75 string::const_iterator end = text.end();
76 bool r = qi::phrase_parse(beg, end, parse_grammar, ascii::space, json_tree);
77
78 if (!r) {
79 cout << "-------------------------\n";
80 cout << "Parse Failed!\n";
81 cout << "-------------------------\n";
82 return(1);
83 }
84 cout << "-------------------------\n";
85 cout << "Parse Succeeded!\n";
86 cout << "-------------------------\n";
87
88 return(0);
89 }
At just around 100 lines, including simple test harness, this is a very concise JSON parser and yet it is also quite complete. The power of template metaprogramming in C++ is what makes Spirit and other internal DSLs possible. This is where all the exciting developments of C++ have been occurring, but because Fowler stopped paying attention to C++ about 10 years ago, he completely misses the party.
Aside from this one glaring omission, Fowler does a good job of explaining many elements of domain specific language implementation. He has lots of good advice for things you should do and when. Each chapter in Parts II-VI is organized as a pattern catalog in a fashion similar to the chapters of his book “Refactoring: Improving the Design of Existing Code”. There is a short summary of the pattern described, a code snippet or object diagram illustrating the pattern, a discussion of how the pattern works and when to use it and finally an example of the pattern in use. Most of the examples are in Java or C#, with the occasional example in other languages where appropriate.
Occasionally, Fowler digresses into a meta-discussion of the book itself. While I feel this is appropriate for a book’s preface, I don’t feel it is appropriate for the main body of the book. These digressions are short and don’t significantly detract from the primary content of the book.
Overall I would recommend this book to anyone that is interested in learning more about techniques for implementing a domain specific language.
17-May-2011 at 3:56 pm
Had to fix up the example in order to get it to compile:
Change
into
LikeLike
17-May-2011 at 4:57 pm
Thanks! I updated the post to credit the original author (Colin Rundel) and so that it compiled with boost 1.42 and VC++ 2005. Boost.Spirit has undergone a number of bug fixes since 1.42, so if you intend to use Boost.Spirit for current development I would recommend the latest boost release.
LikeLike
22-May-2013 at 3:31 pm
Oh hey. I just found this post – again. I happen to have written a more complete attempt at JSON parsing from the RFC text recently. This one _does_ handle unicode, null and escapes. https://github.com/sehe/spirit-v2-json/blob/master/JSON.hpp
I see that Rundel’s version had some elegant tweaks (`struct array : std::vector{};`, e.g.) that I’ll be sure to remember for when I revisit it. Sometime…
LikeLike