Designing the next programming language? Understand how people learn!

Somehow it is a recurring theme in computer science: create a “programming” system that is easier to use and learn than the existing programming approaches. I am not just talking about better tools, like IDEs, but also new languages. It seems as if each self-respecting programmer creates his/her own language or tool-set nowadays, right?

Okay, I have to admit that not all efforts are focused on making things easier, often the focus is on productivity. However, if we, for example, look at the “programming” languages created in the Model-Driven Development there, we see quite some focus on involving domain experts in development by creating higher level, domain-specific, or sometimes even visual languages. Although there are much more reasons to do Model-Driven Development, it is the ease of use and the lower entry barrier that captures the imagination.

There is a fundamental flaw in our thinking around these approaches, though…

Language fundamentals

Let’s look at the fundamentals of a language before we dive into the details. A language specification consists of three main elements:

  • Concrete syntax: the concrete syntax defines the physical appearance of language. For a textual language this means that it defines how to form sentences. For a graphical language this means that it defines the graphical appearance of the language concepts and how they may be combined into a model. Multiple concrete syntaxes can be defined for the same language. You could, for example, define both a textual and a graphical concrete syntax.
  • Abstract syntax: the abstract syntax defines the concepts of a language and their relationship to each other.
  • Semantics: the semantics describe the meaning of a sentence or model specified in some language. In the context of programming this means that the semantics of a language describe what the effect is of executing the statements of that language.

All these elements play an important role in how easy it is to learn a language.

How do we learn a new programming language?

In general we learn in a number of different ways. Up to 10 different learning styles have been defined, but I think they can boil down to 3 main styles:

  • Listen: let someone educate you.
  • See: read stuff, watch someone else do it.
  • Experience: just start and try it yourself, experiment, trial and error.

Most people combine multiple styles when they learn something new.

Let’s go one step further: when do people understand how something works (we are not talking about factual knowledge here)? If they hear, see, or experience cause and effect. That’s when we connect the dots. If you hit a play button and the music starts to play you understand the function of that button, and if you hit different kinds of buttons on different systems that all lead to “the music starts playing” you will probably understand that the triangle icon means “play”.

When we learn how a system works or more specifically when we learn a new programming language, we can have different learning styles, but in the end it is about relating cause and effect. Whether you hear someone explain it, see someone do it, or experience it yourself.

The difference between simple and complicated, and thus easy or difficult to learn, is due to the “distance” between cause and effect. If there are many steps between cause and effect it can be difficult to connect the two. If a “system” is a black-box with multiple inputs and outputs, with a complex relation among inputs and outputs it is difficult to determine cause and effect and hence it is difficult to understand the system.

If we want to create a new programming language that has “easy to learn and use” as a core design principle, we should aim our efforts on an easy-to-understand cause-effect relationship between language concepts and the actual semantics of the language.

Let’s see how approaching a language from this angle influences concrete syntax, abstract syntax, and semantics.

On concrete syntax: how does it convey meaning?

Most discussions (and/or flame wars) around languages are focused on concrete syntax (sometimes combined with discussion about abstract syntax because there is a close relation between abstract and concrete syntax in most languages). Syntactic sugar like white-space and symbols as well as how concise the language is, are hugely interesting discussion topics

Our main question around concrete syntax, however, should be: how does the language convey meaning? If we read the language, do we know what the words mean? Do we understand the meaning of the data (inputs, variables)? Can we easily follow the flow? These are the things that matter!

You want examples you say? Well, this may be the worst readable language of all, that’s the goal at least. The microflow language we use in our platform is a bit easier to read and understand and hence conveys the meaning of the program much better (and it even leads to art).

On abstract syntax: being declarative is not the holy grail

In the world of Model-Driven Development and Domain-Specific Languages the focus tends to be more on the abstract syntax. The core tenets of Model Driven Development, or Model Driven Engineering if you will, are abstraction and automation. Normally if you want to create a language that is easy-to-use by domain experts (often non-programmers) you focus on creating an abstract syntax that leaves out technical details and is as declarative as possible, right? Do not focus on technical details (abstract them away) and specify the “what”, not the “how” (declare the program).

However, abstracting away technical details does not automatically make a language easier to understand. It all depends on what the relation is between the language concepts and the actual behaviour of the application you see (remember: cause and effect). A low-level language creates too much of a distance between cause and effect as, for example, machine-level instructions are not easy to relate to application behaviour. On the opposite side we have language that are too high-level and thereby create a difficult relation between cause and effect too. If one line of code (or a single activity in your process diagram) leads to the execution of a range of actions that can result in a plethora of results, it can be hard to grasp the impact of changing something.

This leads us to the subject of declarative languages. Declarative languages can be very powerful as you just declare the result, not how to get it. Example languages are SQL (well-known, more declarative than procedural), Prolog (if you studied computer science you probably know it), or the languages used by most business rule engines. The nice thing about declarative language is that you abstract away all the “how”, you avoid implementation details. What these languages promise is that programs created with them are easier to understand and maintain, and sometimes that’s completely true.

However, what always bugged me is that users of our App Platform learn to use our procedural DSLs quicker than the more declarative ones. In hindsight that’s maybe not that difficult to explain. It’s probably best understood if you imagine a large system that consists of 1000 rules (or predicates) that are automatically woven into a working system. Do you think a “programmer” can easily connect cause and effect when changing a rule? The lesson: being declarative shouldn’t be a goal on its own, it can even make things more difficult if not applied well. Please note that I focus on the learnability of the language; productivity, conciseness, etc. are different considerations that may influence your choice for declarative languages.

In the end, the abstract syntax of a language should help to reason about the problem you are solving using the concepts of the language. It should be a proper abstraction of the resulting application, this abstraction can be as high-level as possible as long as this abstraction decreases the “distance” between cause and effect. The abstract syntax, the concepts of the language, should also help to decompose a problem in manageable pieces and glue them back together as a complete solution.

On semantics: live programming helps to understand cause and effect

The semantics are the most important aspect for people to learn and understand a language. Semantics connect cause (a certain language construction) and effect (the result of executing that language). As explained in the previous sections the choice for concrete and abstract syntax is important as it defines the “distance” between cause and effect and hence how easy to understand the semantics of the language are. In addition to the language itself, the tools are of the same importance in conveying the semantics of a language.

The environment you use can, for example, greatly enhance the readability of a language by providing mouse-overs, integrated documentation, highlighting, etc. But an environment should do more. It should help you to sculpt your program. It should provide refactoring options that help you to start concrete and refactor to more abstract code when needed. It should provide ways to step forward (and backward!) through the program while inspecting state and behaviour.

A great debugger can really help you to learn and understand the cause-effect relationship of your language and the actual execution. A nice example is the visual debugger we feature in our platform, and you maybe also know the Eclipse Java debugger (or even the Smalltalk debugger) that allows you to change code during a debugging session (within some limits).

Changing code and directly seeing the effects of it (dubbed “live programming” sometimes) is the ultimate way to closely connect cause and effect. If you combine that with stepping forward and backward through the execution (and thus modify state and code at the same time) you have a powerful environment that really helps a user to relate cause and effect and hence understand the language.

Warning: live programming really helps in understanding the language, but not on its own. Please do not forget the things I previously mentioned about concrete and abstract syntax. Cause and effect can be clearly connected, but if you cannot understand that connection when you read the language it will still be hard to understand the language.

Conclusion

If you want to design the next programming language that is easy to learn and use, you should first understand how people learn. The relation between cause and effect plays an important role in the learnability of a language. If you want to clarify the cause-effect relationship you should focus on both the language design and the tools.

If you like this subject you should definitely read this excellent essay that goes into great lengths to explain how we can let people understand programming.

4 Comments Added

Join Discussion
  1. Steven Kelly February 19, 2013 | Reply

    Good points as always! Let’s pare one of the conclusions down a bit though.
    Declarative languages that declare facts are easier to learn and understand than a procedural language doing the same thing – e.g. an SQL declaration of the columns in a table is easier than a series of Java statements adding the columns.
    For things that are more dynamic in nature, like a query, a declarative language may still be easy – e.g. if it maps well to your human understanding of the task description, as a simple SQL SELECT does, but may also be hard – e.g. a couple of OUTER JOINs. However, the same task in a procedural language may be as hard or harder – a lot of code and a lot of thinking.
    Of course there are also tasks that are easy in a procedural language but hard in a particular declarative language. Admittedly in some cases such tasks aren’t an attempt to solve the original problem, but to implement a solution familiar from a procedural language. E.g. if we want to add up the Salaries of a collection of Employees, we may think we want a FOR loop, and then find that writing such a loop in SQL is hard. I’ve certainly seen server-side code that does things like that by iterating in Java over a full SQL result set.
    So, while it may take longer to learn a declarative language, that doesn’t necessarily mean it’s slower to get from zero to a correct solution to a particular problem. In my experience the risk is more that users give up, so never reach the solution. With a procedural language they seem to have more chance of eventually reaching at least a near-solution, i.e. mostly right but with some problems in corner cases.
    I think declarative languages also tend more to domain-specificity. It may not be because of anything inherent, but simply the way languages spread and are used. In a declarative DSL, users who stray from the domain more quickly run up against cases where the language feels like a poor match. The feeling is exacerbated by the fact that in other parts of the program the language feels so easy. That can of course be seen with a GPL too: if in Smalltalk I need to step down from the beauty of “employees collect: #firstName” to looping through with an index variable, it feels bad – but if I was in straight JavaScript I wouldn’t think twice.

  2. I totally agree that seeing live results of your code can make programming easier, because that’ll free you from thinking/visualizing the result in your brain. That’s why I’m working on my LIVEditor (http://liveditor.com) project, basically, you can think it of a combination of a text editor + a browser + Firebug-like html/css inspector, you can see real-time result of all your code edits.

  3. John June 2, 2013 | Reply

    Johan, great site, I immediately subscribed the feed to Keep up to date.
    If you now could do me the favour and replace the light grey font with something that actually can be read, I would be happy.
    Light Grey text is great if you a so-called “web designer” whose only Content consists of lorem ipsum. But it is a PITA for text that is indeed intended to be read.
    Thank you.

  4. Mike Davis July 1, 2013 | Reply

    It is all true, however I believe there is a fundamental assumption which is not valid. That is that the textual languages we use today are the right path. A very viable alternative approach was taken by SmallTalk over 30 years ago and is still alive and more viable today than back in its day. It is hardly a language but rather a complete paradigm and world which offers the best possible solution for any MDx approach.
    Check out the now free for personal use Cincom ObjectStudio 8.4.1. Download and follow the instructions in the Modeler manual for loading the modeler. I think the secret sauce here is that the difference between its high level graphical model and the generated code is directly from graphical representation to object definition (by way of classes, which can remain untouched).
    A bit of investigative effort into this can be a real eye opener!!! IMHO.
    mike

Leave a Reply