Designing the next programming language? Understand how people learn!
Somehow it is a recurring theme in computer science: create a “programming” system that is easier to use and learn than the existing programming approaches. I am not just talking about better tools, like IDEs, but also new languages. It seems as if each self-respecting programmer creates his/her own language or tool-set nowadays, right?
Okay, I have to admit that not all efforts are focused on making things easier, often the focus is on productivity. However, if we, for example, look at the “programming” languages created in the Model-Driven Development there, we see quite some focus on involving domain experts in development by creating higher level, domain-specific, or sometimes even visual languages. Although there are much more reasons to do Model-Driven Development, it is the ease of use and the lower entry barrier that captures the imagination.
There is a fundamental flaw in our thinking around these approaches, though…
Let’s look at the fundamentals of a language before we dive into the details. A language specification consists of three main elements:
- Concrete syntax: the concrete syntax defines the physical appearance of language. For a textual language this means that it defines how to form sentences. For a graphical language this means that it defines the graphical appearance of the language concepts and how they may be combined into a model. Multiple concrete syntaxes can be defined for the same language. You could, for example, define both a textual and a graphical concrete syntax.
- Abstract syntax: the abstract syntax defines the concepts of a language and their relationship to each other.
- Semantics: the semantics describe the meaning of a sentence or model specified in some language. In the context of programming this means that the semantics of a language describe what the effect is of executing the statements of that language.
All these elements play an important role in how easy it is to learn a language.
How do we learn a new programming language?
In general we learn in a number of different ways. Up to 10 different learning styles have been defined, but I think they can boil down to 3 main styles:
- Listen: let someone educate you.
- See: read stuff, watch someone else do it.
- Experience: just start and try it yourself, experiment, trial and error.
Most people combine multiple styles when they learn something new.
Let’s go one step further: when do people understand how something works (we are not talking about factual knowledge here)? If they hear, see, or experience cause and effect. That’s when we connect the dots. If you hit a play button and the music starts to play you understand the function of that button, and if you hit different kinds of buttons on different systems that all lead to “the music starts playing” you will probably understand that the triangle icon means “play”.
When we learn how a system works or more specifically when we learn a new programming language, we can have different learning styles, but in the end it is about relating cause and effect. Whether you hear someone explain it, see someone do it, or experience it yourself.
The difference between simple and complicated, and thus easy or difficult to learn, is due to the “distance” between cause and effect. If there are many steps between cause and effect it can be difficult to connect the two. If a “system” is a black-box with multiple inputs and outputs, with a complex relation among inputs and outputs it is difficult to determine cause and effect and hence it is difficult to understand the system.
If we want to create a new programming language that has “easy to learn and use” as a core design principle, we should aim our efforts on an easy-to-understand cause-effect relationship between language concepts and the actual semantics of the language.
Let’s see how approaching a language from this angle influences concrete syntax, abstract syntax, and semantics.
On concrete syntax: how does it convey meaning?
Most discussions (and/or flame wars) around languages are focused on concrete syntax (sometimes combined with discussion about abstract syntax because there is a close relation between abstract and concrete syntax in most languages). Syntactic sugar like white-space and symbols as well as how concise the language is, are hugely interesting discussion topics
Our main question around concrete syntax, however, should be: how does the language convey meaning? If we read the language, do we know what the words mean? Do we understand the meaning of the data (inputs, variables)? Can we easily follow the flow? These are the things that matter!
You want examples you say? Well, this may be the worst readable language of all, that’s the goal at least. The microflow language we use in our platform is a bit easier to read and understand and hence conveys the meaning of the program much better (and it even leads to art).
On abstract syntax: being declarative is not the holy grail
In the world of Model-Driven Development and Domain-Specific Languages the focus tends to be more on the abstract syntax. The core tenets of Model Driven Development, or Model Driven Engineering if you will, are abstraction and automation. Normally if you want to create a language that is easy-to-use by domain experts (often non-programmers) you focus on creating an abstract syntax that leaves out technical details and is as declarative as possible, right? Do not focus on technical details (abstract them away) and specify the “what”, not the “how” (declare the program).
However, abstracting away technical details does not automatically make a language easier to understand. It all depends on what the relation is between the language concepts and the actual behaviour of the application you see (remember: cause and effect). A low-level language creates too much of a distance between cause and effect as, for example, machine-level instructions are not easy to relate to application behaviour. On the opposite side we have language that are too high-level and thereby create a difficult relation between cause and effect too. If one line of code (or a single activity in your process diagram) leads to the execution of a range of actions that can result in a plethora of results, it can be hard to grasp the impact of changing something.
This leads us to the subject of declarative languages. Declarative languages can be very powerful as you just declare the result, not how to get it. Example languages are SQL (well-known, more declarative than procedural), Prolog (if you studied computer science you probably know it), or the languages used by most business rule engines. The nice thing about declarative language is that you abstract away all the “how”, you avoid implementation details. What these languages promise is that programs created with them are easier to understand and maintain, and sometimes that’s completely true.
However, what always bugged me is that users of our App Platform learn to use our procedural DSLs quicker than the more declarative ones. In hindsight that’s maybe not that difficult to explain. It’s probably best understood if you imagine a large system that consists of 1000 rules (or predicates) that are automatically woven into a working system. Do you think a “programmer” can easily connect cause and effect when changing a rule? The lesson: being declarative shouldn’t be a goal on its own, it can even make things more difficult if not applied well. Please note that I focus on the learnability of the language; productivity, conciseness, etc. are different considerations that may influence your choice for declarative languages.
In the end, the abstract syntax of a language should help to reason about the problem you are solving using the concepts of the language. It should be a proper abstraction of the resulting application, this abstraction can be as high-level as possible as long as this abstraction decreases the “distance” between cause and effect. The abstract syntax, the concepts of the language, should also help to decompose a problem in manageable pieces and glue them back together as a complete solution.
On semantics: live programming helps to understand cause and effect
The semantics are the most important aspect for people to learn and understand a language. Semantics connect cause (a certain language construction) and effect (the result of executing that language). As explained in the previous sections the choice for concrete and abstract syntax is important as it defines the “distance” between cause and effect and hence how easy to understand the semantics of the language are. In addition to the language itself, the tools are of the same importance in conveying the semantics of a language.
The environment you use can, for example, greatly enhance the readability of a language by providing mouse-overs, integrated documentation, highlighting, etc. But an environment should do more. It should help you to sculpt your program. It should provide refactoring options that help you to start concrete and refactor to more abstract code when needed. It should provide ways to step forward (and backward!) through the program while inspecting state and behaviour.
A great debugger can really help you to learn and understand the cause-effect relationship of your language and the actual execution. A nice example is the visual debugger we feature in our platform, and you maybe also know the Eclipse Java debugger (or even the Smalltalk debugger) that allows you to change code during a debugging session (within some limits).
Changing code and directly seeing the effects of it (dubbed “live programming” sometimes) is the ultimate way to closely connect cause and effect. If you combine that with stepping forward and backward through the execution (and thus modify state and code at the same time) you have a powerful environment that really helps a user to relate cause and effect and hence understand the language.
Warning: live programming really helps in understanding the language, but not on its own. Please do not forget the things I previously mentioned about concrete and abstract syntax. Cause and effect can be clearly connected, but if you cannot understand that connection when you read the language it will still be hard to understand the language.
If you want to design the next programming language that is easy to learn and use, you should first understand how people learn. The relation between cause and effect plays an important role in the learnability of a language. If you want to clarify the cause-effect relationship you should focus on both the language design and the tools.
If you like this subject you should definitely read this excellent essay that goes into great lengths to explain how we can let people understand programming.