|
Neal Ford: So, good morning everybody. I am Neal Ford. This is Martin
Fowler. We are going to be fighting our slide show all morning this
morning. So, that should be interesting. We are here to talk about
language-oriented programming and here are some questions that we are
going to answer.
Why is there so much XML mixed in with my Java code? Why would not
everybody shut up already about Ruby on Rails? It seems like any time
you talk to someone about Ruby on Rails they are irrationally exuberant
about it and there has got to be some reason for that and why do things
like aspects exist? Why do we actually need something like aspects if
Java is enough and is there an evolutionary step beyond object-oriented
programming? Is the abstraction layer we have been using so far, is
that sufficient for the problems that we are trying to solve today and
what the heck is language-oriented programming anyway? That is what
Martin and I am here to talk about is this idea of language-oriented
programming.
For the past 20 years or so, we have been trying to model the world
with trees. That is the way we model the world in object-oriented
languages is build hierarchies, trees, trees of how we model things in
code and it turns out that that works out pretty well because most of
the world is tree shaped. Most of the world is hierarchical. You can
fit things into trees pretty easily but that abstraction breaks down
sometimes because a lot of times, we try to model the real world. We
model it like this with a nice tree abstract picture of trees, and of
course this is idealized. This is not the way the world really looks.
The world really looks like this, tangled branches and interconnections
and all sorts of other things that are really, really hard to model in
these kind of idealized pictures. So what we have been trying to do is
model the world with hierarchies but hierarchies fall down at some
point and so we invent things like aspects. The red line that you see
here represents aspects, which cut through the tree-shaped hierarchies
that we built to try to model things in the world. But that just adds
complexity to the problem we are trying to solve, and one of the things
that we are trying to kind of kill off when we do abstractions is to
try kill off complexity. Well, how have we done this in the past? If
you look, for example, at assembly language, how many people here still
write assembly language for their day job? Yeah, that is what I
thought. Nobody writes in that any more because it is too low a level
of abstraction. So, what we have always done in the past is take our
abstractions and raise them a few levels. We do not write in assembly
language any more because it just takes too long to get anything done.
It is way too low a level of abstraction and so we build abstractions
on top of that. In fact, if you think about your hard drive, it is
really just a spinning platter with 1s and 0s on it. We never think of
it that way either. We have all these nice metaphors and abstractions
on top of it. So, what we are suggesting is that maybe it is time to
upgrade our abstraction layer one more step toward language rather than
just hierarchies to be able to represent stuff.
Martin Fowler: So, one, one…sorry.
Neal Ford: Go ahead. Go ahead.
Martin Fowler: One way of thinking about this is that the
object-oriented stuff and the kinds of abstractions that we build with
our object-oriented thinking are abstractions of really allowing us to
build up a vocabulary. We are able to create our own words that talk
about the problem space that we are working with. When we think about
our languages in a way we talk to each other, it is not just about
vocabulary, it is also about how we put the words together, the grammar
of how we speak, and so one of the things we are beginning to think
about is okay, we know how to now build up these vocabularies, how can
we better do the combination of these things? How can we start thinking
about the grammar side as well as the vocabulary side? Neal Ford: What
we are talking about here is changing abstraction mechanisms modeling
the world with the language instead of hierarchies, because in some
cases language actually makes a better modeling mechanism than
hierarchies do.
Well, what we are talking about here is using abstractions from the
past, objects and generics and aspects and all those things to build
better abstractions. We are not talking about throwing away
object-oriented programming. Clearly, object-oriented programming has
done a lot of good stuff for us, but for a lot of the problems we are
trying to solve it still resides at too low a level of abstraction and
that is what we are trying to do is raise our abstraction level one
more step higher, which is what we have always done to solve problems
in the computer science world.
So, why language? Why choose this as our new favorite form of
abstraction mechanism? Well, it turns out that the human brain is
really, really good at supplying context and context is a really
important concept when you are talking about using languages in
abstraction mechanism. Here is a classic example of a DSL: an Iced
Decaf Triple Vanilla Skim with whip latte. Of course, this is the
Starbucks DSL. This is how you order coffee at Starbucks. In fact, when
new employees come to work at Starbucks, the first thing they have to
do is learn the Starbucks DSL and if you say it to them incorrectly
they repeat it back to you in the correct format. There is a very
strict structure to the way people at Starbucks talk and there are all
these rules, there are over a million combinations of possible ways you
can order coffee at Starbucks and have a very exacting way of talking
about that and I will let Martin talk about this one because I have no
idea what this means.
Martin Fowler: Exactly, but the point is not so much the content of
the example as the notion that it is again a combination of words for a
specific contextual area arranged in a particular way but how that
carries that meaning and you have a sense also with this but there is a
sense of flow with this, but as you use and combine things together,
you are able to express yourself much more clearly and you need both
the vocabulary and the way of combining things to make that work. Neal
Ford: This is really just a shorthand mechanism for human
communication. When you think about this, this cricket example, if you
were talking with your friends about cricket, think how cumbersome it
would be to start at first principles all the time. There is a religion
called sport, which is where you gather groups of people together as a
team, and they play each other on something called a pitch and there is
a ball with a bat. I mean it would take so long to have any kind of
conversation that it would be useless to have the conversation and yet,
that is exactly what we do when we talk to APIs and frameworks. We
start at the lowest level possible of understanding and have to explain
every single detail in code to our framework because it does not have
any sort of context built into it, including your own business. If you
are a Java ace and you go to work for a new business, day one, the
hardest job that you face is learning the DSL for the business that you
are going to work for. Every business has their own domain-specific
language and it is very tightly keyed to the kind of problems they
solve in their business and the kind of work that you do.
Martin Fowler: And this is where the tying between this kind of
thinking comes with the classic object-oriented ideas of domain
modeling or indeed a lot of what database people do with data modeling.
In all of these cases, you are trying to understand some particular
business domain and build up that vocabulary and talk about how the
various ideas fit together. But what these techniques have not
typically done is talked about how do we express these combinations.
Again, it is the vocabulary is what we focussed on and what we have yet
to focus on is the grammar side.
Neal Ford: So, all complex human endeavors have their own DSL. It
is all about this implicit context that the brain is really, really
good at supplying when you have a conversation. If someone not in your
line of work has a discussion with you, they do not have to start over
from first principles every time because you share that same context
and that is what we are trying to convey to our APIs and frameworks in
the computer world because people are really, really good at
recognizing implicit content. Our brains are wired up specifically to
be able to do that. So what we are talking about here, you may say,
well, that is just another kind of API. What is the distinguishing
factor between a DSL and a API? I am going to show you a couple of
examples of this. Here is a good example of an API. This is what it
would look like if you have to go order coffee at Starbucks using a
Java framework for ordering coffee, and the interesting thing about
this example you will notice is Coffee latte=new Coffee(Size,VENTI),
latte.setFatContent, latte notice how much repetition there is there.
We repeat the object over and over and over again because it is like
Java is a completely context-free language. You have to tell it over
and over again what the intent is for the code that we’re executing
here. It is almost like Java severely retarded and we have to repeat
ourselves over and over again to say this is what I want you to do.
Remember this is what we are talking about still. It is still the same
context… versus the DSL which has an implicit context, notice that the
word coffee never actually shows up here and yet most people when they
start reading this, when they get two or three words into it understand
there is an implicit context here that we are actually talking about
coffee. DSLs always have an implicit context that shows up either not
at all or shows up in a very, very light way and usually at the most
one time, so that you do not have to supply that context over and over
again.
Martin Fowler: And what is happening here is further development in
something that has been a constant part of the software development
space for quite a while, which is what we have been doing is
concentrating on how do we make the code we write, more readable and
more expressive? I remember 10, 15 years ago, talking about people,
arguing with some people about whether it is worth putting a lot of
effort into good naming of objects and methods. How important is it
where you have methods that really convey what you do? And increasingly
over the years people have realized that it is important to have very
clear method names, very clear class names. But by thinking about how
you name things well, you can reduce the need for other forms of
documentation, and that clarity of code becomes very important. Now,
with the interest in DSLs people are beginning to say, “Okay, we have
got that part of things sorted out”, but if we look at the example on
the previous slide, it is still hard to read, because of all of this
repetition, because of this lack of context. So, how can we take a step
forward to again make things a lot more readable and clear? Because the
real art of programming in the computer is not the communication with
the computer, it is the communication with other human beings who’re
going to have to read that code now and in the future. I have often
liked to be quoted on saying that any damn fool can write a program
that a computer can understand, but good programmers write code that
humans understand and this is really part of this, “how do we
communicate with humans” drive?
Neal Ford: And in fact if you look at this previous example that we
are talking about, this is one of the reasons you business people do
not like to try to read source code, because it is obfuscated because
there is so much repeated context. The meaning here gets lost in noise.
All end users who do not understand Java see as noise here drowning out
the actual content of what you are actually talking about whereas this
is very boiled down, this is the way that people actually talk, not the
way that people usually write code. So, let us talk about some
nomenclature, let us create some definitions here, and this is actually
Martin’s definition of a domain-specific language, a limited form of
computer language designed for a specific class of problem.
Martin Fowler: I would not call it my definition. This has been
around in the software world for quite a while. One of the things that
we will see, is that domain-specific languages, as a term has very
fuzzy boundaries. There are things that are very clearly a
domain-specific language, and things that are very clearly not a
domain-specific language, but there is a very large overlap area. It is
like classifying something as blue or green where things are clearly
blue, things are clearly green, but there are some colors where I can
argue with my wife endlessly about whether it is blue or green, and
that is definitely the case with domain-specific languages. But, one
key property of domain-specific languages is that they have a narrow
focus. You could not write a programming system entirely using a single
domain-specific language, because the range is just too small. The idea
is that you combine one or multiple domain-specific languages with
other domain-specific languages and usually with general-purpose
programming language as well, in order to actually to get stuff done.
Neal Ford: And this is actually one of the places where Martin and I
disagree very slightly because Martin I do not believe thinks that this
Starbuck’s DSL is actually falls in the definition of domain-specific
language.
Martin Fowler: Yeah, I tend to use domain-specific language to
specifically mean a software language. So, it is only something that we
can actually execute on the computer, and when we talk about things
that humans use, that we have not necessarily formalized to that
degree, I tend to use a term like domain language or something a little
bit broader, so as to make a distinction between the specific software
construct and the more fuzzy real world construct.
Neal Ford: I am much more liberal with my definition. I believe
that any language that describes a problem domain is a domain-specific
language and we just agree to disagree about that.
Martin Fowler: Yeah. You are just wrong.
Neal Ford: Yes. The other term and, I think this is actually your term.
Martin Fowler: No, this is not my term either.
Neal Ford: So, you’ve stolen this one as well.
Martin Fowler: I try my best not to invent new terms, but you
cannot help it from time to time. So, language-oriented programming, I
first came across this term from an article by Sergey Dmitriev, he is
one of the founders of JetBrains, who came up with the IntelliJ tool.
As we will see shortly they are working on some very interesting stuff
in this space and he got the term from some obscure academic paper. So,
I do not know where it came from, but I like the term very much because
it talks about this shift of moving from thinking about vocabulary,
which is objects, to the notion of a language that combines vocabulary
and grammar, and so I felt language-oriented programming was a good
term. Also, I was very looking very much for a generic term that would
stretch across a whole range of different styles that was not owned by
a company. And so language-oriented programming seemed to make a good
fit for that because it is not something that is tied to a particular
product. Neal Ford: And this also kind of ties in the idea that what we
are talking about here is an evolutionary step beyond just
object-oriented programming, actually using objects as building blocks
is the next abstraction layer, which is to use languages as an
abstraction mechanism. There are a couple of different fundamental
types of DSL, as you can pretty much separate DSLs into two broad
categories, one of which is internal DSL built using an underlying
syntax of base language.
Martin Fowler: Right. So, it is easy to quickly define both of them
first I think. So, I use the terms internal and external and these are
terms I decided to use, where external is minilanguages in the UNIX
tradition and internal are languages that are really expressions within
a programming language, but done in a sort of language feel to the way.
So, with an internal DSL you are completely operating within your host
language. If you are programming in Java, your DSL is Java. If you are
programming in Ruby your DSL is Ruby. One of the strongest traditions
of this style of programming is in the Lisp world. You talk to Lisp
people about how they construct Lisp programs, and they often think in
terms of building up languages and this is one of the many reasons why
this kind of stuff is seen as very, very old school, because Lisp
people were doing this 30, 40 years ago. So, it is nothing new in that
sense, but the new thing is perhaps putting more attention into it,
particularly in places where people have not put much attention to it
in the past.
Neal Ford: In fact, I think this is so prevalent in Lisp, because
Lisp as a language is so horrific to actually code in, the first thing
you want to do is hide that under some extra layers of abstraction, so
you can hopefully get away from all the parentheses and other sort of
stuff. So, they almost had to invent this style of programming to get
away from the core syntax, because it is so confusing, it is so
daunting. Martin Fowler: All the Lisp people out there, he said that,
not me.
Neal Ford: An external DSL is the opposite kind of DSL, which is
built using your own grammar and a lexer and a parser generated code of
some kind.
Martin Fowler: And this is the traditional UNIX style little
language, where you have to say how something operates and you
configure a little language to work things through. Very, very common
in the UNIX world, and very often UNIX people will talk about how they
will put together some little language in order to drive a particular
programming or will configure a particular programming environment and
I think of nothing to pulling out lex and yacc and twisting them
together and producing the stuff that you have.
Neal Ford: So, let us look at some examples of internal DSLs. Is
anybody already doing this in the mainstream world? If you discount the
very interesting stuff that has been there in the Lisp community around
this area, is anybody really doing this for real today right now? Well,
a good example of this is Ruby on Rails. If you look at Ruby on Rails
code, it is very, very declarative. In fact, most of the code that you
see here is not technically Ruby code at all. It is this DSL they have
written in Rails. Ruby is actually a very popular target for building
internal DSLs because the language has very, very loose syntax rules
and you can get by with a lot of stuff that you cannot get up by with
in a more strongly typed, statically-typed language that has much
stricter rules about its syntax and the way that its code looks. Martin
Fowler: The important thing here is that, I do not know how familiar
you are with Ruby, this is actually old valid Ruby code, but it feels
like a different kind of language. It is like you have invented whole
new key words and ways of putting them together. So, as a result, you
feel like you are in a different language to the actual Ruby language
itself. And at one level, yes it is all Ruby. But it does not feel
quite like that, and here again, we are talking about this very fuzzy
thing about what is the difference with an internal DSL and what is the
difference between that and an API. The fuzzy boundary for internal
DSLs is between APIs and the language, but the essence of it is, I
think, a sense of you just do not feel as if you are in regular Ruby.
You feel like you have extended the language in this case, and to do
something slightly different. And it is partly about the syntax as Neal
said, but it is also partly about the features of the language, certain
programming constructs. In particular, the ability to have closures are
very, very useful for doing this kind of thing. And one of the things
that makes Ruby very interesting and particularly in comparison with
something like Lisp, is that Lisp gives you a limited set of mechanisms
to work with but those mechanisms work really, really well. Ruby gives
you a very wide range of mechanisms. Some of the work that I have been
doing, experimenting for a book, I am working on this topic, I took a
very simple DSL and ended up implementing it 20 odd different ways in
Ruby, using different combinations of language constructs. Some of them
worked well, some of them worked less well, but the really interesting
thing is, how many different ways you can take a different DSL and work
it out with Ruby, because of the fact that Ruby gives you so many
options to work with. If you are working in Lisp, you have equivalent
power, but less range of options and if you are working with Java, you
also have less range of options, but also less power, because it does
not give you some of these alternatives that you might need. Neal Ford:
In fact, as Martin said one of the things that makes Ruby so effective,
and Rails uses it very effectively is this idea of a closure, the last
line of code you see here before destroy actually takes in a block of
code but it is nicely contextual, so it is very easy to read right in
the context of where it is being defined rather than having to create a
new class, an anonymous inner class, and attach a lot of handlers and
that sort of stuff. A lot of behavior can be defined just in line using
closures the way that Ruby supports this idea of closures. And I think
that the DSL portion of Rails is one of the reason that people are so
irrationally exuberant about it. How many times had someone come up to
you and just almost phoned with the mouth that how much they loved
using a Java Web framework of some kind, and yet you see these Rails’
guys do this all the time, and part of the reason for that is that the
tool they are using - Rails - is perfectly suited for the problem they
are trying to solve. This is a domain-specific language for building
web applications that offer persistence. It is very, very highly tuned
to do that and so the tool fits into your hand really well as you write
code in it, there is very little friction between what you want to
accomplish and what the tool allows you to do. You do not have to do a
lot work around and lots of other stuff, it just, you can express the
intent of what you want to do very clearly and very succinctly using
the DSL portion of Rails, and that movement should directly toward
intent is really important because the more friction you can remove
between your intent and the way that you realize that intent is really,
really important.
Another good example of where this is being used in the world right
now are the expectations in Mock Object Libraries. Virtually every Mock
Object Library that you see and I have got a couple of snippets here of
JMock, EasyMock does the same thing and even in the .NET world
RhinoMock does the same thing. So, does Mocha in the Ruby world. The
reason that you see expectations written this way goes exactly to what
we were talking about for DSLs. They are a limited problem domain,
which we are trying to solve which is to set expectations from Mock
Object. Think about how many lines of code this would take if this were
written in a more traditional sort of Java API style. You would have at
least five different lines of set code here, set this, set this, set
this, set this and what that actually does is obscure the intent of
what you are trying to do here, which is set an expectation to say that
this things expects, this thing wants, with this method, etc., notice
how much context has been drained away from setting this expectation in
JMock and what you are left with is just the intent of what you are
trying to accomplish. Martin Fowler: A few interesting things here with
this. Again notice this, this is using Java. We are not using some
weird language like Ruby or Lisp to do this kind of thing. So, again it
brings out the point that this kind of internal DSL work can be done in
a relatively straightforward language. It is also important to notice
that the language kind of looks different, even in terms of formatting,
in fact most obviously in terms of formatting. Because now we have got
these cascades of methods with dots on each line which is formatted
very ugly. I remember the first time I looked at code that had been
written this way, it kinda looked weird, but you get used to it fairly
quickly, and then you begin to appreciate how useful it can be.
It is also worth mentioning at this point as well that I use this
term here internal DSL, you will also hear some people and particularly
the Mock people also use this, they use the term embedded DSL. In fact,
an embedded DSL has a longer usage than internal DSL. I avoid it
because it gets confused with embedded languages in the sense that say
VBAs is an embedded language in Microsoft word, and so, because
embedded has these two meanings, I decided to use internal DSL to focus
on these things. It is also important to realize here that this is very
much again the way of manipulating an object model. In the end, these
are just objects that are being wired together in a particular way.
There is no reason you could not use a regular API to do this. Indeed,
what is going on under the covers in the JMock library is a regular
API, and they originally built it with a traditional API. What they did
with this expression syntax that they put over it is they have added a
layer over that API that allows you to correct these expectations in a
more friendly and readable format.
Neal Ford: If you think about how this is implemented it is not
like this is some sort of mysterious rocket science implementation,
especially the way this is chained together because in Java for all of
your traditional set methods instead of returning void, which is kind
of a waste of a perfectly good return value, why not just return this?
That allows you to chain together a series of method calls like this
and achieve this kind of what we are calling a fluent interface which
we will talk about in just a second. In fact, we will talk about it
now, fluent interfaces where you treat lines of code as sentences
because in English and in most spoken languages a sentence is a
complete unit of thought and this idea of a fluent interface comes from
readability and I am going to give Martin credit for this term too but
he will probably deny this one as well, this fluent interface. Martin
Fowler: I get half credit for this.
Neal Ford: Okay.
Martin Fowler: The origin of this came from, I was at a workshop
with Eric Evans, you might have heard of Eric Evans. He wrote the
excellent book ‘Domain-Driven Design’, and one of the things about
myself and Eric is that we are both ex-Smalltalkers, and we did a lot
of Smalltalk programming in the mid 90s, in fact that is where we first
met, working together on a Smalltalk project. And one of the things
that we talked about during the course of the workshop that Eric
particularly lamented was the fact that when he worked with APIs in
Java, they did not seem to have that same flow that a lot of better
Smalltalk APIs seem to have perhaps and we would try and explain what
we meant and in that discussion when we came up with the term fluent
interface, and we liked the term because it brought up that notion of a
language. If you read a regular API, it seems to have that kind of
stuttery quality of somebody that does not really speak your language
properly the way…probably more lack of the other languages.. being
British, I cannot say anything, but you know what I mean in the sense
somebody is speaking not a language that they are really comfortable
with, and what we wanted to see was much more of this sense of fluency
in a flow. And Smalltalk people unlike Lisp people never really talked
about defining languages in Smalltalk. They talked about building
domain models and putting them together, but they still had, a good
number of them still had this very strong notion of trying to make that
model really work in this flowing way. I remember sitting down and
doing some programming once with Ward Cunningham and was really quite
taken aback at the way he would rearrange what I was doing with an API
just to make this read and things that fit together much better. I did
not really understand what he was doing at the time, it’s something I
only really appreciated it later on, and again, it is this push towards
readability that does it. And so we felt that fluent interface was a
good term for thinking of that. You are taking an API and then making
it flow a bit more, and tactics, particular mechanisms like having set
methods that return themselves, return the object you have just
changed, cascading the method things, those are a common thing to do,
and in fact that was default practice in Smalltalk, but that is a
mechanism. The real aim at what we are trying to do is to have
something that has this readability, this sense of flow, which is a
very hard to define thing, it is not a precise thing, but it is what we
are trying to achieve. Now, fluent interface and an internal DSL are
really two ways of looking at the same thing depending on where you
come from, whether you think of yourself as “I am trying to create a
language here” or “I just have an API and I am just trying to make it
more readable and more useful.”
Neal Ford: And readability is critical. Everybody is well familiar
with the statistic that lines of code are read two-and-a-half times as
often as they are written. So, even if it takes a little bit longer to
write a line of code to make it more readable there is a big payoff at
the end because anyone who reads it can actually understand what you
are talking about in a much clearer way. So, let us look at a concrete
example of building a fluent interface, like I said this does not have
to be any sort of rocket science or any kind of brand new technique or
anything, this is the kind of code most of us deal with all day every
day. This is a traditional sort of API kind of code in Java where you
are creating a Car object and you are associating a
MarketingDescription with it. So, you create a Car object and you
create a MarketingDescription and you setType and setSubType and set
all these attributes, and finally, you set the description to that Car
object. You can very easily convert this into a fluent API that looks
like this, and all this really is Car.describedAs that is instantiating
any Car object and each one of the dots here are really just what used
to be, the setters that you had in the API style of code, but what you
are doing here is actually creating set methods that are aware of the
context in which you are operating that allows you to create a sentence
out of this Java code and what is nice about this is that your business
analyst or whoever is consuming this code now actually has a fighting
chance to be able to read this code because it is much, much closer to
the way they talk about Cars and MarketingDescriptions rather than
being in this kind of stilted, almost very formal old English style of
writing code which we are accustomed to in the Java World with all this
repeated context and all this extra noise in terms of sets and
properties and that sort of stuff. Martin Fowler: This writer on
graphic displays called Edward Tufte who is highly regarded as one of
the best people to read when it comes to presenting visual information,
and he says that when you are doing this in a chart or diagram, it is
very important to remove the noise dots on the diagram. Everything on
the diagram should convey some meaning. There should not be any
extraneous stuff, and that is really what we are trying to do with the
code, remove all the extraneous stuff. The second thing that I tend to
do a lot when I am doing this which is not at all obvious from
something like in the final picture that we have here, is that I
typically create an additional object as a layer over the basic API and
I refer to this as an expression builder object. The problem is that if
you put all these methods on the Car class and you looked at the API of
the Car class, it would look very odd because the methods do not make
much sense sitting there in the javadoc on their own. The methods look
meaningless because they are robbed of their context. So, what I would
like to do is to have an expression builder object that I put over my
regular API, which is purely designed to support the fluent interface.
And then I have a way of getting that, the Car object back out when I
am done and that is the kind of mechanism I use to implement it. In
that way, the regular object can have a normal API with nice-looking
javadoc and the fluent stuff is nicely contained in this builder. And I
can play all sorts of tricks inside the builder in order to make the
language flow because I will focus on language.
The jMock libraries are an excellent example of that structure.
Again, they have their regular API and then they have the fluent stuff
that they place on top of it through just a couple of extra classes
that they add to the structure. They do quite a sophisticated approach
to this that uses quite a number of tricks to makes things flow well.
One in particular that is very nice is that depending on where you are
in your expression certain terms are legal or not and they do this by
use of interface, multiple interfaces on the same object. And the nice
consequence of this is that if you are in a good editor, the
IntelliSense that you get leads you through building the expression,
and so, it has that as a server usability thing when you are writing
it. Neal Ford: Absolutely, if you are writing jMock code it is very
well designed because every time you hit dot the things that you see
there are the things that you are interested in doing next with that
expectation that you are setting. And I want to re-emphasize the point
here that this is not some sort of magical thing that we are creating
here, all the building blocks are already there in the Java language
and doing this, it is more an attitude shift of creating these fluent
APIs more than anything else because there is no rocket science
technology at all here, it is really just the intent of let us see how
readable we can make code rather than how obfuscated we can make it
which seems to be the default in the Java world for some reason.
There is another good example of this style of coding that exists
out in a world. There is an open source library you can download from
Google called Hamcrest, which is really just fluent interface wrappers
for JUnit matchers. So, this is an add-on to JUnit that allows you to
say things like, I guess at Google when they do good work they give
them biscuits because all their examples are in terms of biscuits. So,
I guess that is what they are doing with their millions of dollars is
buying biscuits, but you can say for example assertThat(theBiscuit
is(equalTo(myBiscuit))) and that is much more readable line of code
than the kind of a formal assert equals this, that syntax that you
normally see. And all this is very light patina over the existing JUnit
match or classes that are already there. In fact, if we look at the
latest versions of jMock, they incorporate the Hamcrest library to make
their expectations even more expressive. So there are lots of people
out in the world that are kind of embracing this style of coding. There
are some building blocks for internal DSL. As we emphasized you do not
have to do anything special to create these kind of fluent interfaces,
but there are some building blocks that you can use. Languages with
looser rules tend to make better DSL bases, simply because the looser
rules allow you to get closer to English and closer to this goal of
fluency because Java has some pretty strict rules about where its
punctuation goes, and every line has to have a semicolon and all those
rules that make up Java. And so, you see a lot of people pursuing this
style of coding in these more dynamic languages like Groovy or Ruby and
of course JRuby, which allows you to write Ruby code that runs on the
JVM, and here is a good example of this. You can add a support for time
intervals in Groovy because let us face it, the java.util.Calendar
class is pretty broken in Java. It is different from all the other
classes. The sets work completely differently from every other class in
Java. They cannot seem to get date stuff exactly right in Java because
the java.util.Date and then they said no let us take a mulligan on
that. Let us throw it away and redo it. So they duplicated that whole
thing and said no, use Calendar instead.
What you can use in Groovy is this thing called a Category, which
allows you to essentially add new methods to built-in classes like
Integer. So, I have got a category here IntegerWithTimeSupport that
lets you add method calls to Integer and the goal here is to actually
create this line of code that you see in red there
2.days.fromToday.at(4.pm) that returns a java.util.Calendar class
instantiated at, as you probably guessed, two days from today with the
time set at 4 pm. This is a perfect example of what we were talking
about before is using the building blocks that we already have
java.util.Calendar and the APIs that exist in Java and building a
fluent interface on top of that and the nice thing that Groovy allows
us to do here, Ruby does as well, is to do things like add methods to
numbers so that you can create a time-interval support for core objects
like Integers in Java. Martin Fowler: Yeah, there are a number of
things that I think help make a language more DSL friendly. Looser
rules is important. A lot of this is syntactic stuff like dropping
things are necessary. It is amazing how much difference readability you
get if you do not have to put parentheses, particularly empty
parentheses pairs around things. Little things like that often make a
lot of difference. Being able to add methods to existing classes within
a context allows you to reorganize things and gives you more
flexibility. Closures are really very, very important for more
sophisticated cases. Another thing that is very important is the
ability to have a literal collection structures particularly lists and
HashMaps and be able to easily put those in as literals. In fact, that
is the coolest structure of lists. It is the easability to write lists
easily, and similarly you need that, I think in order to be able to do
DSLs well. It is also useful to be able to easily construct symbolic
types as opposed to using strings everywhere or enums because again it
will allow a shorter, more compact way of representing things.
Neal Ford: And this is an example of open classes which is what you
add behavior to core types like Integers and the stuff that you
normally cannot touch in Java, but it is very handy to be able to add
behavior to them, for example, add this idea of time intervals to the
Integer class so that you can actually say 2. something and have that
instantiate the time interval for you based on that method on Integer.
Martin Fowler: Lots of little things, but each one like the
parentheses, does not seem that important when you mention it on its
own and yet it makes a difference in getting something that reads
clearly.
Neal Ford: So, when you think about how much it would obfuscate
this example if we had to put parentheses at the end of each one of
these things that would just be noise that would help destroy the
fluency of the line of code that we were trying to create here. So,
dynamic languages tend to make better building blocks just because they
have support for some of these building block elements that we are
talking about. Of course, it is not required that you use these, but
they make generally better base for internal DSLs. External DSLs are
the next thing, the other category of DSLs that we have here which are
written in a different language than the main host language of the
application and transform using some form of compiler or interpreter
into some executable code of some kind. And these can be plain text
files like configuration files in the Apache world, in UNIX world, XML
documents, I think that is one of the reasons why Java is overrun with
XML right now is we really want some sort of external syntax and we
have kind of settled upon XML because it is so easy to parse, and of
course now we are overrun with XML. Every framework has at least one
XML configuration document and you end up with a non-trivial
application and you have four or five different dialects of XML
document, each of which is essentially their own language because when
you create a grammar for an XML file you are simply defining the
grammar for an external language that is embodied in that XML document.
Martin Fowler: This is a very important point because people will
often say, well, do we actually use DSLs, and I would argue that most
Java projects use a lot of DSLs and they are all embedded in these
endless XML files that float around, and they are an external DSL, you
have to parse them and bring them in, and usually, the framework will
do that for you. We typically, we kind of gravitated to XML because
there are very easy tools available for the syntactic analysis of the
XML files, but the resulting XML is not very clear, and you do not
really get great readability because there is a lot of noise words
involved. And again you have that lack of flow. But I think the fact
that the XML files are so pervasive in the Java world, in the .NET
world is a testament of the fact that we need and we wanted to express
things in domain-specific languages, but the mechanism we have used
actually is not terribly good.
Neal Ford: And so what we have done is sacrificed readability for
parsability in the XML world. You would never show an XML document to
an end user to try to explain anything to them because it would
frighten them to death. All those pointy places on it looks like a
porcupine, looks like if you touched it you can cut yourself on it
because it looks hazardous to consume, and one of my favorite quotes
about XML is actually a Dave Thomas quote, he said that XML is really
just data dressed up like a hooker. So, building blocks for external
DSLs, you have parser generators and we have a wealth of these in the
Java world right now, we have Antlr, which the first time I saw it I
thought it had something to do with ant which has nothing whatsoever to
do with ant, JavaCC and Yacc and SableCC including some really
sophisticated tools in the Antlr world now are called AntlrWorks which
makes it very easy to work with these grammars that you create. Martin
Fowler: Yeah, the interesting thing about these tools, however, is that
there is not very much out there that is written about how to use them
well. Like most people I did a little compiler class when I was at
college for a few weeks and promptly forgot most of what I was taught
there, and as a result I never got terribly comfortable with using
these tools. Over the last few months as part of the research for the
book I am working on, I had been very much burying my head in these
kinds of things, looking at example, little languages in the UNIX
world, in the Java world, and seeing how people use these kinds of
tools. And one thing that is very, very obvious to me is (a) the tools
are actually not that hard to use once you get the hang of them, but
(b) there is very, very little stuff out there to help you get the hang
of them. So, it is a real struggle to learn how to use these. Now, the
situation is improving on some fronts, in particular with the Antlr
toolset. There is a book just appeared in ‘The Pragmatic Programmer’
series, that talks about how to use Antlr. There is a very nice IDE
called AntlrWorks that gives you things like syntax highlighting,
refactoring of grammars, visual…visualizations of what gets recognized
by the grammar files, specialized debugger and that will certainly help
to make it a lot easier to use this kind of stuff. But there is still a
big gap there between the actual tools and the knowledge to be able to
exploit them effectively. I am hoping that some of the work I am doing
will help fill that gap, but at the moment I do have to warn you on
that. If you want to play around with these kinds of external tools,
for the moment I would definitely recommend Antlr because of the book
and because of the IDE. It helps you get into it and use it a lot more
easily. Unfortunately, what is not in the book is really very much a
device about how to choose and design a DSL very appropriately. It
tells you a lot about how Antlr works, but not a huge amount on how to
use it for DSL work.
Neal Ford: Internal DSLs have some advantages because you have the
full power of the underlying language and you have full access to
sophisticated tools like IDEs. When you are defining a fluent interface
in Java, you can use IntelliJ or Eclipse and you get code insight and
all the other things that we are accustomed to in the Java world
because we have all these sophisticated tools for working with Java
code and of course an internal DSL if it is written in Java is
fundamentally a Java program just expressed in a different way.
Martin Fowler: And that is both the plus and the minus because in
many ways what you want to do with the DSL is kind of restrict your
level of expression because that way you make less mistakes. And so,
sometimes having that full blown-ness can be confusing, but of course
if you are very familiar with that then it becomes much less of an
issue. Neal Ford: The disadvantage of internal DSLs is it is hard to
write these in modern “curly brace” languages because you just cannot
boil away a lot of the context. Java is very strict about parentheses
and periods and semicolons and that sort of punctuation that is
required by the Java language. You are limited by the syntax and the
semantics of the language, so for example if you are doing this purely
in Java, you cannot add methods to the Integer class to allow you to
express time ranges and using Integers as the ultimate object that you
are calling your method on. And you have to understand the base
language or you are in syntax trouble. This is a classic blunder by
people from the PHP world that look at Rails and go, oh Rails, this is
a much more sophisticated form of PHP. I will start doing that and they
read some recipes from the Rails book somewhere, but they do not
understand Ruby syntax and they very quickly get in trouble because
ultimately you are writing a Ruby code even though it does not look
much like Ruby code, that is what you are writing because it is an
internal DSL sitting on top of this base language of Ruby.
External DSLs had the advantage that you are free to use any form
you like, just let your imagination be your guide. That is actually an
advantage and a disadvantage because you have infinite possibilities
for what this thing can look like. You are limited really only by your
ability to parse the language that you create into something that you
can produce code from. Like Martin said, he has created 20 different
versions of the same DSL experimenting with exactly how does this look
and how does it work.
Martin Fowler: And then the parse point is important because there
are certain things that you can do to make parsing easier and things
you can do to make it harder. And again, this is part of the stuff that
is not often very clearly indicated out there.
Neal Ford: The disadvantages of external DSLs is you have to build
the translator and that is no small feat because you have to understand
grammars and parsing and even though the tools make that easier, it is
still a daunting task. You lack the support of your base language which
means that you finally create this beautiful external DSL that you have
created, but now you have got to use just a regular text editor to be
able to edit it. You do not get any of the nice symbolic integration
that you get with tools like Eclipse and IntelliJ because you are
ultimately just writing out a plain text file that gets translated into
something else. Of course, you could write your own IntelliJ for your
own language, but now you are talking about a really daunting task of
building not only a language, but also the tools that understands the
language. Martin Fowler: This was of course less of an issue in UNIX
days because hey! You just make your own emacs major mode and that will
be straightforward, but the bar has gone up now. With more
sophisticated IDEs you expect a lot more in the programming experience.
So, a plain text file becomes much more awkward to use.
Neal Ford: That is what you want to tell your users is, oh well,
here is emacs, just edit it in emacs, that is perfectly fine, which
brings us around to Language Workbenches. Well, Martin and I are not
the only people in the world who are interested in this whole
language-oriented programming idea. In fact, there is three, at least,
major vendors, actually more than this, but these are three
representatives ones who are actively pursuing this as a style of
coding. There is Intentional Software developed by Charles Simonyi. A
lot of you probably recognize his name. He was the guy at Microsoft who
created Word and Excel and at some point, I think he was trying to get
Microsoft to kind of pursue his vision for this Language Workbench kind
of tool and they were not that interested in pursuing it. So, he did
what any self-respecting Microsoft employee would do. He took his
billions of dollars and he stomped out the door and created his own
company called Intentional Software, and they have been working for
years now on this very sophisticated tool that allows you to do
external DSLs. Microsoft has since started embracing this idea and they
are creating what they are calling Software Factories, which is sort of
this Language Workbench idea but it is a little more tilted toward the
modeling world in creating executable models and those sort of things
and finally, Meta Programming System which is developed by JetBrains,
the makers of IntelliJ, next the one we are going to talk a little bit
about here is MPS because it is the only one that you can really
actually touch right now.
Martin Fowler: Yeah, just before we dive into it though, I will
mention briefly, partly to do with the Microsoft stuff. We focused on
textual languages so far in this talk with all sorts of reasons, I am
not going to go delving into but there is also a school that is very
interested in graphical languages and the Microsoft, the main specific
language stuff is very much focused on that. A lot of people in that
world come out of the CASE tool model-driven development kind of
background and there is a lot of questionable stuff in that area but
there is also some very good stuff as well and there is some definite
interlinks with some of the things that we are talking about and I tend
to see that as within the broad category of the main specific languages
but it is just outside the bounds of what we have got time to talk
about today. Neal Ford: So let us do a little bit of background and
talk about Language Workbenches for a second. This is pretty much what
compilation has looked like since the time you wrote your very first
computer program where you have some sort of external representation, a
text file, you hand it to a compiler, which parses it and compiles it
into a syntax tree and then use something some sort of linker or some
other mechanism and create an executable representation of that code
and that is the way compilers have worked for a very, very long time
now since the 1950s at least.
Martin Fowler: You think about this as a sequence of
transformations. You begin with source code, which is one
representation. We transform this into a different representation that
sits inside the compiler’s memory which is the parse tree or the
abstract syntax tree and then we use a code generator to transform it
into a third representation which is the executable representation that
we actually run. In Java’s case, that executable representation is Java
bytecode which is itself then goes through another transformation step
to actually get into a machine code that runs for a particular system.
So in many ways, what we are thinking here is a kind of a pipeline of
multiple representations and transformation steps between them, and
when I say that you parse a source file into an abstract syntax tree,
if you look inside that process, again you find a sequence of
transformations inside there.
Neal Ford: But all of this have changed a little bit when
IntelliJ first came out and this is a phrase that Martin coined and I
believe this is yours, wholly yours, post-IntelliJ IDEs because
IntelliJ was really the first tool that let you edit directly against
the abstract syntax tree instead of just the text. That is actually how
refactoring works. They do not do a massive search or replace operation
in your source file. What they do is modify the abstract syntax tree
and then reflect that back in the actual source code for your
application and you will notice that we are getting better and better
at this because the early refactoring tools were very intolerant of
compilation errors. You had to be able to compile the whole thing. Now,
they’ve gotten much better and you could actually edit parts of an
abstract syntax tree even if another part of it does not actually work
because there are syntax errors and other source of ambiguities in it.
That leads us around to this idea of a Language Workbench, which
rotates around this relationship with your custom tool from the
traditional world because now you are operating mostly in some sort of
projection from the abstract syntax tree and that could be a plain text
file but it could just as easily be some sort of graphical designer
tool or it could be database schema designer or something like that.
You still have some sort of versioning storage that let you version
this abstract syntax tree and you have some way to transform it into an
executable representation of some kind, but the transformation process
is actually different here. The general focus of your work in
traditional compilation is the beginning of that sequence, which flows
through to the end. This is more concerned with the middle part, which
is really the important part, which is the abstract syntax tree there
in the middle.
Martin Fowler: In traditional compilation, the key artifact is the
source code which has to contain everything about the program and the
abstract syntax tree is something which is really very transient
because it occurs during compilation although it also occurs during
editing in a post-IntelliJ IDE. With the Language Workbench, the
central thing is the abstract representation and when you edit it, you
edit through this projection what it looks to you and you do not have
to have everything in the projection. You can just have different
projections with different subsets of information and then the abstract
representation ties everything together. So, it twists around the
traditional way of looking at the relationship between source code and
the other representations. Neal Ford: The editable representation is a
projection of the abstract representation and the abstract
representation has to be comfortable with errors and ambiguities just
the way refactoring tools are now. Think how annoying a tool would be
that said you have to be able to compile it before we can actually save
your work. We would never want to use tool like this. So, one of the
challenges of building a tool like this is to build it in such a way
that it is actually comfortable and lets you save that internal
representation even if it is an error or if it is incomplete somehow
which is a nifty trick.
So the examples we are going to show here just very briefly are
screenshots and these are actually fairly old screenshots of MPS. The
idea behind MPS is that instead of building objects and APIs and
hierarchies like you normally do, you build concepts and a concept is
something like a date, a very, very small thing but along with that
concept, you also build an editor that is concept aware. So, this is an
example that Martin actually created in conjunction with the guys at
JetBrains to create a DSL that describes rate plans for electricity
usage, I believe it is a…
Martin Fowler: Hmm…mm.
Neal Ford: …gas utility usage and what you do is build up a date
concept and a numeric financial concept and you build all these
concepts as individual entities and then wire them together to make
bigger and bigger things. So, this is an example and notice you get
syntax highlighting because now the language you have created is aware
of what the pieces are. So, you can do intelligent syntax highlighting.
You can also do intelligent editing so that when you enter a cell, it
knows what format it is expecting. You can also do things like code
insight because the concept is defined as some sort of entity that you
are interested in. You can define in what ways it can be edited
legally. So, one of the things that is created along with this rate
plan DSL is this concept of financial number, which can encompass
dollar figures but it can also encompass Excel-like formulas and other
sort of functionality, and the interesting thing about a Language
Workbench like this is, when you tab into this field, just like in an
IntelliJ, the options that you have there are the options that are in
context over this particular concept that you created and those are the
only things that show up there are the things that are pertinent to
that context.
So, one of the common complaints against the style of coding is
doesn’t this lead to language cacophony? Well, if I have developers out
here creating their own languages, is not it going to be a nightmare
because some guy who develops some sort of language and he will wander
off and now I have got some sort of mess of a whole bunch of different
languages and if you end up in this case, that means your language is
very, very poorly designed. This is exactly the same problem you run
into when you have your your developers internally building APIs. If
they do really a terrible job of building the API, it is going to be
hard to maintain too. Exactly, the same with language, because in some
ways, what we are trying to do is represent the same kind of concept
just in a slightly different way. Martin Fowler: In many ways, really
what is going on with domain-specific languages is very much the thing
that is doing with frameworks. In order to take on a project, you have
to not just use your base language. We have to also introduce multiple
APIs, multiple libraries, multiple frameworks as part of this. If you
are going to build a web app these days, you have to decide oh, I need
some kind of something like Spring MVC. I need to use Spring. I need to
use Hibernate or use a whole bunch of these frameworks and that becomes
part of your development environment and it is a whole bunch of things
you have to learn. I do not think actually that domain-specific
languages add very much to that. They are not any more hard to learn
than the fundamental frameworks that they cover because all they are
fundamentally is a fairly slim veneer over the actual frameworks
themselves. It is these abstractions and frameworks, which are the hard
things. The domain-specific languages are just merely small things that
make it easier to use them. So I do not think it actually makes a big
difference. We are not, in particular, it is important to stress we are
not talking about multiple general-purpose languages. Each
domain-specific language is usually something very small and limited.
Again, the comparison is to think of something like a Hibernate
configuration file or a Struts configuration file. The only difference
is it is written in something a bit more readable than the XML.
Neal Ford: And in fact, if it is hard to read your DSL, then you
have done a very poor job of creating it, because that was one of the
goals, to create more readable code.
Martin Fowler: Well, keeping an eye on our time, we should probably
think about wrapping up. I know we have got a few more slides but, so
we can…
Neal Ford: Yep. We will just go over these very quickly. Why are
not we already doing this? External DSLs give you the most potential
but up into at a much higher cost because you have to build your own
parser generator, etc. There is this COBOL inference which says, well,
once we are able to write our own languages, then we will not need
developers any more and this is another Martin Fowlerism, that most
technologies that are supposed to eliminate professional programmers do
nothing of the sort. Your end users are not interested in writing code.
They are interested in the code doing stuff and they still need
programmers to do that. What we are talking about is creating code that
a business analyst can read, not write, and speaking the same language
as the business people. If you can create really fluent interfaces, it
allows you to communicate with them better which is what I said. Martin
Fowler: And that is a point worth emphasizing. That is something we
have begun to see in some of the projects that we have done at
ThoughtWorks where we have taken a deliberate attempt to use DSLs and
make them readable for business people and the important thing here is
readable and reviewable, not that they are necessarily writable.
Neal Ford: So, the boundary between external DSLs and
general-purpose languages, Martin has already talked about, you do not
want to create a general-purpose language. You want to create a very,
very tightly focused domain-specific language that covers just one
domain. You are better off having a lot of small domain-specific
languages and trying to create a new general-purpose language, which
is...
Martin Fowler: In general, domain-specific languages are not too
incomplete and in general, they do not provide abstraction mechanisms
such as subroutines or object or things of that kind. It is not 100%
true but it is true most of the time.
Neal Ford: They tend to be very declarative rather than imperative in nature.
Martin Fowler: Yeah.
Neal Ford: So, there is a token type not recognized….
Martin Fowler: Oh! Cool. Neal Ford: That is interesting. When you
are building DSLs, it is a good idea to start with the end in mind.
There are two different ways to do this. You can take an API and morph
it into a DSL, but the other way to do this, this is a classic picture.
This is the Rake napkin. Jim Weirich when he created the Rake, which is
make for Ruby, was frustrated with the make utility that he was using
and he was at lunch one day and talking to a colleague and he said, “I
am so frustrated with make. I think I am just going to rewrite it in
Ruby”. So they sat down and sketched down on a napkin, what it would
look like if it were in Ruby and this is an actual scan of the Rake
napkin. He took it back to his desk and two hours later, he had the
beginnings of the Rake project. So he started with what he wanted to
see and worked toward that. So, internal DSLs and dynamic languages,
Language Workbenches for static languages, we believe that this is a
huge competetive advantage because you have got better abstraction
mechanisms, slightly harder to write but easier to maintain. This may
well be the next big paradigm. This may be the next evolutionary step
beyond object-oriented programming.
Martin Fowler: Yeah. I do not tend to look at it as the next big
thing. I do not necessarily know what the next big thing is but I do
think it is something that is interesting, and the fact that I am
putting the time in…to make this almost certainly my next book is a
sign of how interesting I think something like this might be. But I do
not want to oversell how important it is. It is a useful, another tool
to consider adding to the toolbox but it is also worth when you have
some bits of it you can use today, particularly the internal DSL stuff.
Some bits of it are definitely much more on the horizon. The Language
Workbenches, I think are really, really interesting but it is going to
be a few years before most people can think about using them for real
projects.
Neal Ford: So they are running us away from here, so thank you very much and I hope you enjoyed the rest of the conference.
|
|