Day 281 – Nature of Programming

85 – “Hello World” in 100 Programming Languages

This is a more technical post than the usual 365 fare.
Apologies to the non-programmers.

On the face of it, programming is about creating programs that do something useful. Turning a problem into a solution. And hopefully a working one at that.

To me, programming is about quite a bit more than that; it’s the act of taking an abstract idea and trying to encode all aspects of that idea into a correctly functioning piece of software. I use the word “trying” with purpose, because expressions in real-world programming languages are all just approximations of this ideal. Real-world programs always have some level of defects and “good enough” about them.

In the first instance I’d always strive for the ideal though.

Programming languages provide a great many tools to express meaning; comments, named variables and methods, automated tests, type systems, code contracts with formal verification.

The strongest expressions of meaning are always preferred; a formally verified code contract beats using parameter types alone. Boxing-in the meaning through parameter types beats well considered names. And even code comments beat constraints inherent in the idea that are left un-expressed altogether.

Every piece of meaning from the original idea that is added to a piece of code improves the chances of it getting implemented and maintained correctly. Additional meaning might have the compiler catch you in a lie before it becomes a bug. Additional meaning might help you remember the nit-picky details of how things hang together when you try to work on a piece of code years after you originally wrote it.

When I start with an idea, I look at it from all the angles. I try to work out what the underlying truths of the idea are. And then I try to find a way to translate all of these truths into pieces of the program.

Code contracts can expose code paths that can lead to null pointers where my feeble brain told me it was impossible for them to appear. Exploiting the type system can let the compiler warn me when I’m about to crash the Mars lander by mixing up my feet and metres. Expressively naming my methods and variables can help me spot where I’ve forgotten to sanitise a user-input.

And the trick is to push the encoding of concepts as far as possible into the strongest encodings the language has available.

Sure, I could make all my variables dynamic, then add unit tests to verify appropriate behaviour by variable type… but why not use a strong type system instead? Dynamic variables should really only be used where there is no alternative; maybe because some aspect of the strong type system is too strong to express the flexibility inherent in an implementation. But to use them routinely and then patch the hole with unit tests shows a very profound misunderstanding of the purpose of the type system.

Sometimes approaches can be combined to make an even stronger encoding of a concept. Just using Hungarian Notation to differentiate sanitised user input from un-sanitised user input is a good start… but adding program-wide automated testing in the build system to verify that variables using the naming convention for sanitised input only are assigned from sanitised input, or from the sanitisation method reinforces the concept in a way that makes it almost part of the language itself. A visible portion of the code that is verified by almost-the-compiler.

And there will be concepts that are hard or impossible to encode to the ultimate degree.

A variable name can indicate that the assigned value should only ever be a prime number… but there is very little that can be done to guarantee this is true, beyond hoping everyone is careful not to break that promise in the code. There is no way to reasonably implement a strongly typed “PrimeNumber” type.

But that doesn’t mean we shouldn’t keep trying.

And sooner or later, the ad-hoc encodings that have broad use and applicability will turn into new programming language paradigms for the next generation of languages. And they will be harder to program in… but only because they won’t allow us to be nearly as imprecise with our “words”.

You can lament the fact that not using dynamic variables means that you need to put in some extra effort… but all I hear is “why can’t you let me have some more hard-to-diagnose bugs?”

Understanding Code

Spot The Invariant

Writing correct code in itself is not the most difficult and important thing programmers have to be able to do. By far the more difficult and crucial part of my profession is understanding code.

Understanding code is a factor in all of the following activities:

  • Finding bugs in existing code
  • Enhancing existing code
  • Using libraries from new code
  • Keeping a mental model of a system

And all of these activities are distributed over larger parts of a code base, whereas writing correct code is essentially a very localised activity. (Note that designing the architecture for an application or system can be much more difficult, but only needs to be done once, and is therefore not as big a part of the programmers life as understanding is).

And understanding isn’t always easy…

static int calculate(int n)
{
    int a = 0, b = 1;
    int x = 0, y = 1;
    while (n != 0)
    {
        if ((n & 1) == 0)
        {
            int temp = <span class="hiddenGrammarError" pre=""><span class="hiddenGrammarError" pre="a ">a;
            a</span></span> = a * a + b * b;
            b = temp * b * 2 + b * b;
            n = n / 2;
        }
        else
        {
            int temp = x;
            x = a * x + b * y;
            y = b * temp + a * y + b * y;
            n--;
        }
    }
    return x;
}

In fact, this fragment goes out of its way to give as few clues about what it does as possible, and yet it implements a very simple and well-known calculation.

Two Viewpoints

Every language makes trade-offs in its syntax between being terse and being understandable. Perl[1] is a famously terse language, where randomly mashing the keyboard is almost guaranteed to result in a valid program, whereas Java is well known to be verbose but relatively easy to understand.

Terse syntax has many obvious benefits. The terser the syntax, the more compact the code, the more source will fit on a single screen. And the more source fits on a screen the broader the overview you can get at a single glance over a fragment of code.

This is the rationale of many modern languages like Boo and Python and Ruby for example.

It seems to me though that the danger of making languages terse is that it optimises for ease of local understanding over easy of remote understanding. No matter how brief the source of a method is, if the method signature doesn’t give (m)any clues about the encapsulated functionality then using the method from another location in the source becomes needlessly difficult.

Clues

Naming is the first line of defense against incomprehensible code, and the only one that is guaranteed to exist in all languages (as far as I am aware). Classes get names, methods get names, often parameters get names. If these names are chosen well, both local and the global understanding are improved. This is why naming is such a big deal, and everyone intuitively understands this.

But often, names alone do not make the full constraints on parameters clear. This is where strongly typed languages have a further benefit. And even more so if there is at least optional explicit typing. Every parameter and return value that has a type implicitly tells something further about the meaning about the method (we’re operating on a collection, it holds strings, the result is a number, etc). I concede that small applications may be able to get away with pervasive dynamic typing, but larger systems often compensate by adding unit tests whose implicit purpose is to make sure methods operate on the right type of arguments and return the right type of results. But really, moving the typing information out of the method signature and into tests is in my opinion not progress.

Beyond this there are many more mechanisms that variously are employed to make code more understandable (documentation comments, code contracts[2], unit tests, word documentation), but as you stray further and further away from the actual source code and method signatures itself, the connection between the code and it’s constraint becomes ever more tenuous. Often this requires intervention from the IDE and third party tools and build checks, which move ever further away from the point where a developer is trying to understand the code in front of them.

Remedies

I don’t really have a one-size-fits-all answer or a ready-made recipe to make code better, but here are a few suggestions.

  • Use good names – don’t just look at this from the local perspective; what would a method invocation look like? Is the invocation self-documenting? For methods with multiple parameters, try and use the parameter names at the invocation site if your language will allow you to. Consider using names to compensate for the lack of other mechanisms; if your language of choice has no explicit typing, maybe calling an argument “stringsToEliminate” is a good idea? Maybe calling an argument “positiveInteger” is a good idea? But don’t make anything more specific than it needs to be.
  • Use good types – even when typing parameters, pick the broadest type that will work. In .NET the compiler will keep telling you to use collection interfaces rather than explicit collection types. And sometimes creating a new type just so you can make one or more signatures more explicit may be a good trade-off; if a method only should take a value generated from a specific collection of other methods, then making the type “double” is probably not as good an idea as using a custom “Temperature“.
  • Use good tools – if you use any other mechanisms outside the language per-se, such as code contracts or source documentation, then you must have supporting tools as well. Source documentation is useless unless you have an IDE or tool that can “transport” this documentation from the method it is on to the place where it is invoked. Once you’re forced to go to the source of the implementation, odds are that a terse method implementation could more concisely explain what is going on already.

But first and foremost, pick a language that is suited to the problem at hand. Python is not a bad language, but I’m not sure it’s a good language for systems programming; it probably is better suited to web systems that often have many small self-contained features. Brevity is not a bad thing, as long as it doesn’t force you to work around an inability to use good names, or good types, or good tools.

Back to the Example

So, what of the example at the start of my post? It would have been much less opaque had the signature been along the following lines…

/// <summary>
/// Calculate the n-th Fibonacci number using an
/// O(log N) algorithm.
/// </summary>
static int Fibonacci(int n)
{
    Contract.Requires(n > 12, "Result will overflow int");
    ...
}

Making the implementation itself easy to understand requires a few pages of explanation, and I’ll leave that as an exercise to the reader.


Footnotes

  1. Apparently, Perl users are unable to write programs more accurately than those using a language designed by chance – source: Lambda-the-Ultimate
  2. As of .NET 4, there’s a very useful Code Contracts mechanism available to C# developers; there are even plug-ins for the VS2010 IDE that can make these contracts visible as an implied part of the pop-up method documentation

Back to Basics

For a while now I have been postponing writing a post about my progress regarding exceptions in software. I have informally formed an outline of an opinion, but I have been looking for a way to build a stronger foundation than “because I think that’s the right way to do it”.

Then, as I started straying further afield with my mind wandering over multi-threaded code, dependency injection, unit testing and mocking as well (and some others that I know I have forgotten), it occurred to me that I really should go back to basics with all this…

  • The most fundamental tool to reason about software correctness is still to think in terms of invariants over state-space and pre-conditions/post-conditions to method invocations.
  • Guides on “good coding practices” abound, but there are certain fundamental truths in most of them that are universal enough to almost be as good as “formal methods” to reason about “good code” beyond merely “correct code”.
  • Both the DRY principle (“don’t repeat yourself”) and a desire to produce self-documenting code further suggest that keeping as many perspectives on a single piece of code as close together as possible is the best way forward. The new .NET 4 Code Contracts already provide some unification between code, documentation and testing, but I think there is more possible that has not been exploited yet in this arena. Some tricks may be needed to keep aspects such as tests and documentation together with the code without overly burdening the generated assemblies with dead weight that does not participate in the execution of the code itself.

I strongly believe that C# as a language leaves us with too much flexibility in the general case. Every iteration of the language adds more interacting features, and opens up many useful possibilities as well as some that are dangerous or perhaps even plain wrong.

Some code patterns, although allowed by the compiler, just do not make any sense. There are usage patterns of exceptions that *will* compile, but really should be considered an error.

Tools like FxCop try to plug some of those holes by checking for such errors after-the-fact. Unfortunately, custom error conditions are not as easy to express in FxCop as I think they ought to be. But in principle this is definitely a path worth exploring to eliminate options that might be best avoided.

I think the rather nebulous state of this post reflects the fact that my mind hasn’t completely crystalised into a single vision of what combination of tools and paradigms I need to get to more ideal development practices. But I think I am starting to make some progress.