Monthly Archives: June 2015

Some little-known reasons why D makes day-to-day development easier

When discussing programming languages, most people focus on the big, sexy features: native compilation; functional programming; concurrency. And it makes sense, for the most part these are the features that distinguish programming languages from one another. But the small, little-known features can make regular day-to-day coding a lot easier.

My favourite language is D, as I’ve mentioned before. But languages aren’t like football teams; I’m not a D fan because my dad is or anything like that, I like writing code in the language because I feel I’m more productive. There are several reasons for that, but today I want to talk about the unsexy little things that help me crank out working code faster.

writeln: yes, it’s basically printf. But it’s actually so much more. In most languages, printing out built-in types is easy. “printf(“%d”, intvar)” in C or “cout << intvar” in C++. But you want to actually print one of your own types… you have to write code. Lots of it. Not so in D, the defaults just work. The only real exception are classes, for which you need to override toString. But idiomatic D code doesn’t use many of those, preferring structs. The other good thing about types being easy to print is that you can paste them in to another D source files and compile it. I’ve had to do that a few times.

enums: other languages have enums, they don’t even look very different in D:

enum MyEnum {

So how do they help me more than in other languages? First of all, I’ll refer you back to my writeln point: when you print them out you get their string representation, not a number. Internally they’re really just a number like in C, but you never really have to care or worry.

Secondly, “final switch” makes enums a lot more useful by making sure you deal with every single enumeration. Add another value? Your “final switch” code will break, as it should. For those not following the link, in D a final switch on an enum value means that the code will fail to compile unless all enum values have a case statement.

getopt: Defined in std.getopt, it does the very unsexy task of parsing command-line options, but it does it so well. This is the kind of thing that templates allow you to do. In this code:

 int intopt;
 double doubleopt;
 MyEnum enumopt;
 string[] strings;
 auto optInfo = getopt(
     "i|int", "Int option", &intopt,
     "d|double", "Double option", &doubleopt,
     "e|enum", "Enum option", &enumopt,
     "s|strings", "string list option", &strings,
 if(optInfo.helpWanted) {
     defaultGetoptPrinter("myapp", optInfo.options);

Will do what you expect it to do, and maybe more. It automatically converts the strings given to the program at run-time to the types of the variables that store them. Even better, look at the -s option. If the program is then called with -s string1 -s string2, it’ll hold those two string values. It really is as easy as it looks.

struct constructors: or the lack thereof. Basically, if you write this:

struct Foo {
    int i;
    string s;
    void func() {} //just so it isn't a POD

auto foo = Foo(4, "foo");

It’ll compile and work. You can define struct constructors if you want, but it’s the bog-standard one, you don’t have to.

I’m sure there are other lesser-known features in D that make real life programming easier, but these are the ones I can think of of right now.

Valgrind-driven development

At work I’m part of a team that’s responsible for maintaining a C codebase we didn’t write. This is not an uncommon occurrence in real-life software development, and it means that we don’t really know “our” codebase all that well. To make matters worse, there were no unit tests before we took over the project, so knowing what any function is supposed to do is… challenging.

So what’s a coder to do when confronted with a bug? I’ve come to use a technique I call valgrind-driven development. For those not in the know (i.e. not C or C++ programmers), valgrind is a tool that, amongst other things, lets you precisely find out where in the code memory is leaking, conditional jumps are done based on uninitialised values, etc., etc. The usefulness of valgrind and the address sanitizers in clang and gcc cannot be overstated – but what does that have to do with the problem at hand? Say you have this C function:

int do_stuff(struct Foo* foo, int i, struct Bar* bar);

I have no idea how this is supposed to work. The first thing I do? This (using Google Test and writing tests in C++14):

EXPECT_EQ(0, do_stuff(nullptr, 0, nullptr);

99.9% of the time this won’t work. But the reasons why aren’t documented anywhere. Usually passing null pointers is assumed to not happen. So the program blows up, and I add this to the function under test:

int do_stuff(struct Foo* foo, int i, struct Bar* bar) {
    assert(foo != NULL);
    assert(bar != NULL);

At least now the assumption is documented as assertions, and the next developer who comes along will know that it’s a violation of the contract to pass in NULLs. And, of course, that this function is only used internally. Functions at library boundaries have to check their arguments and fail nicely.

After the assertions have been added, the unit test will look like this:

Foo foo{};
Bar bar{};
EXPECT_EQ(0, do_stuff(&foo, 0, &bar);

This will usually blow up spectacularly as well, but now I have valgrind to tell me why. I carry on like this until there are no more valgrind errors, at which point I can actually start testing the functionality at hand. Along the way I’ve built up all the expected dependencies of the function under test, making explicit what was once implicit in the code. More often than not, I usually find other bugs lurking in the code just by trying to document, via unit tests, what the current behaviour is.

If you have to maintain C/C++ code you didn’t write, give valgrind-driven development a try.

Is there a thing as too many types?

I’m writing a build system at the moment. As it turns out, even the most mundane builds end up having to specify a lot of options. Imagine I want to make life easy for the end-user (and of course I do) and instead of specifying all the files to be built, we just specify directories instead. In pseudo-code:

myApp = build(["path/to/srcs", "other/path/srcs"])

But, for reasons unbeknownt to me as well as a few good ones, it’s often the case that there are files in those directories that aren’t supposed to be built. Or extra files lying around that aren’t in those directories but that do need to be built. And then there’s compiler flags and include directories, so a C++ build might look like this:

myApp = buildC++("name_of_binary",
                 ["path/to/srcs", "other/path/srcs"],
                 ["extrafile.cpp", "otherextra.cpp"],
                 "-g -O0",
                 ["include_from_here", "other_include_dir", "yet_another"],
                 ["badfile.cpp", "otherbadfile.cpp"])

If you’re anything like me you’ll find this confusing. All the parameters (and there are several) are either strings or lists of strings. The compiler flags were passed in as a string so they stand out a bit, but the API could have chosen yet another list and it’s hard to say what is what. This didn’t seem like an API I wanted to use so I definitely wouldn’t expect anyone else to want to use it either.

Then the mental warping caused by me learning Haskell kicked in. What if… I add types to everything? They won’t even really do anything, they’re just there to tag what’s what and cause a compilation error if the wrong one is used. So I added a bunch of wrapper types instead, and in actual D code it started to look like this:

alias cObjs = cObjects!(SrcDirs(["etc/c/zlib"],
                        Flags("-m64 -fPIC -O3"),
                        IncludePaths(["dir1, "dir2"]),
                        ExcludeFiles(["badfile.c", "otherbadfile.c"]);

Better? I think so! You can’t confuse what’s what, and even better, neither can the compiler. After I exposed this to other people, it was pointed out to me that there are many ways to select files and that the API as it was would have to have even more parameters to satisfy all needs. It was big enough as it is so I changed it to something that looks like this now:

alias objs = targetsFromSources!(Sources!(Dirs(["src"]),
                                          Files(["first.d", "second.d"]),
                                          Filter!(a => a != "badfile.d")),
                                 Flags("-g -debug -cov -unitest"),
                                 ImportPaths(["dir1", "dir2"]),
                                 StringImportPaths(["dir1", "dir2"]));

Better? I think so. What do you think?

The Loopers and the Algies

In my opinion there are two broad categories of programmers: the Loopers and the Algies.

The Algies hate duplication. Maybe above all else. Boilerplate is anathema to them, common patterns should be refactored so the parts that look almost-the-same-but-not-quite end up calling the same code. When Algies write C++, they use <algorithm>. They use itertools and functools in Python. They tend to like programming languages with first-class functions, that let you abstract, that let you collapse code. They loathe writing the same code over and over again.

Loopers don’t mind repetition as much. Their name comes from the way they write code: for, while, do/while loops and their equivalents everywhere. They end up writing the same loops over and over again, but they don’t mind. It’s not that they’re bad programmers – it’s just that all the things the Algies like? They think it makes their code more complex. Harder to understand. I actually saw one of them say during code review that this:

stuff = []
for x in xs:
    if condition(x):

Was simpler than this:

    stuff = [ x for x in xs if condition(x) ]

That’s how their brain works. A loop is… simpler. If it’s not evident by now, I’m an Algie through and through. Somebody asked me one day why I didn’t like C and the first thing that came to my mind was “C makes me repeat myself”.

It’s really hard to get people to agree on what good code actually is. I think most of us agree that we want our software to be maintaiable. Readable. But we all have different ideas on what each of those words mean. Take complexity for instance: we want less of it right? Well, that’s why the Loopers like for loops, in their opinion that reduces complexity. map, filter and reduce are complicated to them, whilst to me, they’re my bread and butter. Not only that, it’s my honest opinion that using algorithms reduces overall complexity. Fewer moving parts. Fewer things to reason about. Fewer bugs.

I think many programming language wars are mostly about people from very different philosophies arguing about things that are important to them that the other side just doesn’t understand. Go, for example, is quite clearly a language for Loopers. They don’t need generics, that would complicate the language. The moment I realised Go wasn’t for me was when I realised that without generics, there could be no Go equivalent of <algorithm> and that I’d have to write for loops. And I really don’t like writing for loops. You shouldn’t either. Unless you’re a Looper of course, and to each his own.

DConf 2015 has come and gone

It was a blast, just like last year was. It really is gratifying to spend 3 days discussing one’s favourite programming language with several people who really know their craft. Of course, attending the presentations and being able to ask questions live isn’t a bad deal either.

I think the highlight for me was Andrei Alexandrescu’s provocatively titled “Generic Programming Must Go”, in which he describes a new way of designing code based on compile-time reflection. He apparently came up with it while designing the new allocators for the D standard library, and I highly recommend watching it (and the other videos!) when they’re online.

I’m actually getting to write D code back at work as well, so that’s always good. In my spare time I’m furiously working on a meta build system, which will definitely feature in a future blog post.