DConf2016 report

DConf 2016 happened last week in Berlin. As usual it was really cool to be able to attend and talk to so many great developers from around the world.

There weren’t really any earth-shattering announcements. The only thing that comes to mind is for-now anonymous donor offering the D foundation half a million dollars as long as there’s a plan to spend that money. That’s great news and I’m looking forward to hearing what that money will be spent on.

The talks were great, of course: they should be available soon enough so the details aren’t really needed. My favourite was probably Don’s talk on floating point numbers. I’ve read the IEEE spec and I still learned a lot.

And there was my lightning talk. I only found out about the schedule an hour before, which was the amount of time I had to write it. After a stress-filled hour passed, I learned that they didn’t have a VGA cable and had another cortisol spike as I tried to figure out some sort of solution. In the end I ssh’ed from a German Mac into my laptop, to sometimes hilarious consequences, with me ad-libbing with no brain-mouth filter for a few minutes. I need to see the video once it’s available to see how well I did…

main is just another function

Last week I talked about code that isn’t unit-testable, at least not by my definition of what a unit test is. In keeping with that, this blog post will talk about testing code that has side-effects.

Recently I’d come to accept a defeatist attitude where I couldn’t think of any other way to test that passing certain command-line options to a console binary had a certain effect. I mean, the whole point is to test that running the app differently will have different consequences. As a result I ended up only ever doing end-to-end testing. And… that’s simply not where I want to be.

Then it dawned on me: main is just another function. Granted, it has a special status that makes it so you can’t call it directly from a test, but nearly all my main functions lately have looked like this:

int main(string[] args) {
    try {
        doStuff(args);
        return 0;
    } catch(Exception ex) {
        stderr.writeln(ex.msg);
        return 1;
    }
}

It should be easy enough to translate this to the equivalent C++ in your head. With main so conveniently delegating to a function that does real work, I can now easily write integration tests. After all, is there really any difference between:

doStuff(["myapp", "--option", "arg1", "arg2"]);
// assert stuff happened

And (in, say, a shell script):

./myapp --option arg1 arg2
# assert stuff happened

I’d say no. This way I have one end-to-end test for sanity’s sake, and everything else being tested from the same binary by calling the “real” main function directly.

If your main doesn’t look like the one above, and you happen to be writing C or C++, there’s another technique: use the preprocessor to rename main to something else and call it from your integration/component test. And then, as they say, Bob’s your uncle.

Happy testing!

Tagged , , ,

Some code you just can’t unit test

My definition of unit tests precludes them from communicating with the outside world via the file system or networking. If your definition is different, your mileage on this post may vary.

I’m a big unit test enthusiast who uses TDD for most code I write. I’m also a firm believer in the testing pyramid, so I consider unit tests to be more important than the slower, flakier, more expensive tests. However, I’ve recently come to the conclusion that obsessing over unit tests to the detriment of the ones higher up the pyramid can be harmful. I guess the clue was “obsessing”.

In two recent projects of mine, I’ve written unit tests that I now consider utterly useless. More than that, I think the codebases of those two projects would be better off without them and that I did myself a disservice by writing them in the first place.

What both of these projects share in common is that they generate code for other programs to consume. In one case, generating build systems in GNU Make or Ninja, and in the other converting from GNU Make to D (so, basically the other direction). This means writing to files, which as mentioned above is a unit test no-no as far as I’m concerned. The typical way to get around this is to write a pure function that returns a string and a very small wrapper function that calls the pure one to write to a file. Now the pure function can be called from unit tests that check the return value. Yay unit tests? Nope.

Add another section to the output? Your tests are broken. Comments? Your tests are broken. Extra newlines? Your tests are broken. In none of these scenarios is the code buggy, and yet, in all of them N tests have to be modified even though the behaviour is the same. In one case I had passing unit tests checking for code generation when the output wouldn’t even compile!

If your program is generating C code, does it really matter what order the arguments to an equals expression are written in? Of course not. So what does matter? That the code compiles and has the intended semantics. It doesn’t matter what the functions and variables you generate are called. Only that if you compile it and you run it, that it does the right thing.

Code that generates output for another program to consume is inherently un-unit-testable. The only way to know it works is to call that program on your output.

In my GNU Make -> D case I was lucky: since I was using D to generate D code, I generated it at compile-time and  mixed it back in to test it so I had my unit test cake and ate it too. Compile times suffered, but I didn’t have to compile and link several little executables to test it. In most other languages, the only way forward would be to pay “the linker price”.

Essentially, my tests were bad because they tested implementation instead of behaviour, which is always a bad idea. I’ll write more about that in a future post.

MVC is really lots of mutable state?

I’ve been doing the Rails tutorial recently. Quickly at first and now a bit slower. The book is really well done, but my interest in implementing a Twitter clone is waning, so I’m just trying to do a little bit every day.

I like all the testability available in Rails. It’s saved me from many mistakes I’ve made while writing the code in the book, which is great. That’s the point of tests.

Ruby is a cool little language. I suspect I like it more than Python, but I just haven’t used Ruby enough to see its warts. Once I do I’ll be able to have an educated opinion.

What I’m really disliking so far though is the amount of mutable state that seems to be needed to get anything done in this framework. The Controller part of MVC doesn’t really control so much as it sets instance variables to be picked up by embedded Ruby code in the HTML view template. That makes me feel… dirty. One of my own quotes is “Mutable state is the root of evil”, so there’s that.

The other thing that’s slightly bugging me about Rails right now is the amount of magic that happens behind the scenes. I love me some automagic: I’m a metaprogramming enthusiast because I like my code to write my code for me. But… I’m more comfortable when knowing how the magic works and what problems it’s solving. Right now, naming things correctly just seem to connect things to each other, it all works, but I have no idea how or why.

Still, it’s an impressive framework. “rails generate” is awesome. And the number of things web developers need to juggle at the same time is impressive.

Tagged , ,

Web Dev

I’ve pretty much always been a systems programmer. These days most of what I see on programming blogs and the like are related to web development somehow, and it makes sense. From mobile to actual websites, this is how most things are shipped. People buying software to run on their desktop computers is, like, so 20th century.

I figured this was a gaping hole in my CV so I’ve been meaning to dip my toes in for quite a while now. I unexpectedly “sorta kinda” finished all the personal projects I wanted to work on and found myself girlfriend-less for the weekend and now I’ve gone through half the chapters of the Ruby on Rails Tutorial book. It’s really well written, I recommend it. I’ve been fascinated by the journey.

There are a lot of moving parts in web development, it turns out. Even though I haven’t written a website from scratch, the sheer number of directories and hints the books drops about the work Rails does for you is amazing. I know what goes into talking to a database – it’s incredible how easy it all is.

As soon as I’m done with the tutorial, I just need to think up a cool personal project that a website would be appropriate for. Also, I might finally write enough Ruby code to be able to make an informed comparison with Python. I think I like Ruby better, but I just haven’t written enough code in it yet.

Tagged , , ,

unit-threaded: now an executable library

It’s one of those ideas that seem obvious in retrospect, but somehow only ocurred to me last week. Let me explain.

I wrote a unit testing library in D called unit-threaded. It uses D’s compile-time reflection capabilities so that no test registration is required. You write your tests, they get found automatically and everything is good and nice. Except… you have to list the files you want to reflect on, explicitly. D’s compiler can’t go reading the filesystem for you while it compiles, so a pre-build step of generating the file list was needed. I wrote a program to do it, but for several reasons it wasn’t ideal.

Now, as someone who actually wants people to use my library (and also to make it easier for myself), I had to find a way so that it would be easy to opt-in to unit-threaded. This is especially important since D has built-in unit tests, so the barrier for entry is low (which is a good thing!). While working on a far crazier idea to make it a no-brainer to use unit-threaded, I stumbled across my current solution: run the library as an executable binary.

The secret sauce that makes this work is dub, D’s package manager. It can download dependencies to compile and even run them with “dub run”. That way, a user need not even have to download it. The other dub feature that makes this feasible is that it supports “configurations” in which a package is built differently. And using those, I can have a regular library configuration and an alternative executable one. Since dub run can take a configuration as an argument, unit-threaded can now be run as a program with “dub run unit-threaded -c gen_ut_main”. And when it is, it generates the file that’s needed to make it all work.

So now all a user need to is add a declaration to their project’s dub.json file and “dub test” works as intended, using unit-threaded underneath, with named unit tests and all of them running in threads by default. Happy days.

Tagged , , , , ,

C++’s killer feature: #include

I don’t think it’s a secret that the main component to C++’s success was that it was (nearly) backwards compatible with C. That made the switch easy, and one could always just extend existing software originally written in C by using  C++. It helped that, at the time, C++ had a feature set not really matched by any of the languages at the time. Abstractions at no cost? Fantastic.

It’s 2016 now, however. Many of the tasks that C++ was usually chosen for can now be done in Java, C#, Go, D, Swift, Rust, … the list goes on. Yet C++ endures. For me, it’s no longer my go-to language for pretty much anything. Unless… I have to call C code.

A few months ago at work, I decided to write my own implementation of certain C APIs in an embedded context in order to make it easy to test the codebase my team is responsible for. I had a quite extensive set of header files, and our C code was calling these APIs. I knew straight away that there was no chance of me picking C for this task, so the question was: which language then? I ended up going with  C++14. Why? #include, that’s why.

Every language under the sun has a way to call C code. It’d be silly not to, really. And it all looks straightforward enough: declare the function’s signature in your language’s syntax and tell it it’s got C linkage, and Bob’s your uncle. Except, of course, that all the examples are passing in ints, floats and const char*. And real life APIs don’t look like that at all.

They need pointers to structs, which are defined in a header that includes a header that includes a header that… Then there are usually macros. You don’t pass a regular int to a function call, you pass a macro call to a macro call (defined in a header that…). Then there’s the case in which macros are part of the API itself. It gets hairy pretty fast.

These days libclang has made it possible to write tools that parse headers and generate the bindings for you. There’s also SWIG. But this means complicated build system setups and they’re not foolproof. If you’ve ever used SWIG, you know there’s a lot of manual work to do.

But in C++…

#include "c_api_with_macros_and_stuff.h"

For me, that’s basically the only use case left for C++. To call C and not have to write C.

Emacs as a C++ IDE: headers

So, headers. Because of backward compatibility and the hardware limitations of when C was created, it’s 2016 and we’re still stuck with them. I doubt that modules will make it into C++17, but even if they do headers aren’t going away any time soon. For one, C++ might still need to call C code and that’s one of the languages killer features: no C bindings needed.

If, like me, you’ve created a package to make Emacs a better C++ environment, they present a challenge. My cmake-ide package actually just organises data to pass to the packages that do heavy lifting, it’s just glue code really. And the data to pass are the compiler flags used for any given file. That way, using libclang it’s possible to find and jump to definitions, get autocomplete information and all that jazz. CMake is kind enought to output a JSON compilation database with every file in the project and the exact command-line used. So it’s a question of parsing the JSON and setting the appropriate variables. Easy peasy.

But… headers. They don’t show up in the compilation database. They shouldn’t – they’re usually not directly compiled, only as a result of being included elsewhere. But where? Unlike Python, Java, or D, there’s no way to know where the source files that include a particular header are in the filesystem. They might be in the same directory. They might be nowhere near. To complicate matters further, the same header file might be compiled with different flags in different translation units. Fun.

What’s a package maintainer to do? In the beginning I punted and took the set of unique compiler flags taken from every flag in the project. The reasoning is that most of the time the compiler flags are the same everywhere anyway. For simple projects that’s true, but I quickly ran into limitations of this approach at work.

A quick and easy fix is to check if there’s an “other” file in Emacs parlance. Essentially this means a Foo.cpp file for a Foo.hpp header. If there is, use its compiler flags. This works, but leaves out the other header files that don’t have a corresponding source file out in the cold. There’s also a runtime cost to pay – if no other file is found it takes several seconds to make sure of that by scouring the file system.

I then looked at all source files in the project sorted by levenshtein distance of their directories to the directory the header file is in. If any of them directly includes the header, use its flags. Unfortunately, this only works for direct includes. In many cases a header is included by another header, which includes another header which…

In the end, I realised the only sure way to go about it is to use compiler-computed dependencies. Unfortunately for me, ninja deletes the .d dependency files when it runs. Fortunately for me, you can ask ninja for all the dependencies in the project.

I haven’t written the code for the CMake Makefile generator yet, but I should soon. ninja already works. I’m going to test it myself in “real life” for a week then release it to the world.

The C++ GSL in Practice

At CppCon 2015, we heard about the CppCoreGuildelines and a supporting library for it, the GSL. There were several talks devoted to this, including two of the keynotes, and we were promised a future of zero cost abstractions that were also safe. What’s not to like?

Me being me, I had to try this out for myself. And what better way than when rewriting my C++ implementation of an MQTT broker from scratch. Why from scratch? The version I had didn’t perform well, required extensive refactoring to do so and I’m not crazy enough to post results from C++ that lose by a factor of 3 to any other language.

It was a good fit as well: the equivalent D and Rust code was using slices, so this seemed like the perfect change to try out gsl::span (née gsl::array_view).

I think I liked it. I say I think because the benefits it provided (slices in C++!) are something I’m used to now by programming in D, and of course there were a few things that didn’t work out so well, namely:

gsl::cstring_span

First of all, there was this bug I filed. This is a new one to shoot oneself in one’s foot and we were not amused. Had I just declared a function taking const std::string& as usual, I wouldn’t have hit the bug. The price of early adoption, I guess. The worst part is that it failed silently and was hard to detect: the strings printed out the same, but one had a silent terminating null char. I ended up having to declare an overload that took const char* and did the conversion appropriately.

Also, although I know why, it’s still incredibly annoying to have to use empty angle brackets for the default case.

Rvalues need not apply

Without using the GSL, I can do this:

void func(const std::vector<unsigned char>&);
func({2, 3, 4}); //rvalues are nice

With the GSL, it has to be this:

void func(gsl::span<const unsigned char>&);
const std::vector<unsigned char> bytes{2, 3, 4};
func(bytes);

It’s cumbersome and I can’t see how it’s protecting me from anything.

Documentation

I had to refer to the unit tests (fortunately included) and Neil MacIntosh’s presentation at CppCon 2015 multiple times to figure out how to use it. It wasn’t always obvious.

Conclusion

I still think this is a good thing for C++, but the value of something like gsl::not_null is… null without the static analysis tool they mentioned. It could be easier to use as well. My other concern is how and if gsl::span will work with the ranges proposal / library.

 

Tagged , ,

Rust impressions from a C++/D programmer, part 2

Following up from my first post on Rust, I thought that after a week running a profiler and trying to optimise my code would have given me new insights into working with the language. That’s why I called it “part 1”, I was looking forward to uncovering more warts.

The thing is… I didn’t. It turns out that the 1st version I cranked out was already optimised. It’s not because I’m a leet coder: I’ve implemented an MQTT broker before and looked at profiler output. I know where the bottlenecks will be for the two benchmarks I use. So my first version was fast enough.

How anti-climatic. The only news-worthy thing that came out of benchmarking it is that the Rust/mio combo is really fast. You’ll have to wait for the official benchmarks comparing all the implementations to know how much though. I’m currently rewriting my problematic C++ version. I have to if I want to measure it: either I give it my best shot of making it fast or the reddit comments will be… interesting to say the least. It got nasty last time.

Tagged ,