Category Archives: Uncategorized

CppCon 2017 videos so far

I didn’t get to go to CppCon this year (I was there the previous two years), and now that some of the talks are up on youtube I’ve been checking them out to keep up-to-date. It’s been a little disappointing.

I don’t know if it’s because of the types of talk that I like, if I’ve learned a lot over the last few years or if the standards for admission have gone down. What I do know is that just today on my lunch break I dismissed quite a few talks after watching them for 5 minutes. Some might say that’s not enough, but I think I’m pretty good at evaluating a if a talk would be worth my time in that period.

So far, the only talk I’ve liked is the first one about IncludeOS. For me that’s not surprising, I think it’s a really cool project and have been interested in unikernels from the first time I heard about them. It helps that it happened when I was working at Cisco and learned that the project I worked on then was faster on class Cisco IOS (a monolithic kernel) than on the newer operating systems.

I might have to play around with IncludeOS now. I’m just afraid I’ll start getting ideas about writing a unikernel in D or Rust…

Advertisements

Arch Linux – why use a Docker image when you can create your own?

It seems silly in retrospect. I’d never have considered building an Arch Linux based Docker container in any other way but starting with one from the registry and a Dockerfile. But… it’s Arch Linux, you install this distro into a directory and chroot into it. Why settle for someone else’s old installation?

The script used to bootstrap an Arch install, pacstrap, even lets you exclude some packages from the default install or add things you need. So I ended up with a bash script that installs Arch into a directory, chroots into it and runs commands as required, then bundles the whole thing into a docker container. Repeatable, checked in to version control, and not wasting layers of AUFS.

Who needs a Dockerfile?

Dipping my toes in the property based testing pool

I’ve heard a lot about property-based testing in the last 2 years but haven’t really had a chance to try it out. I first heard about it when I was learning Haskell, but at the time I thought regular bog-standard unit testing was a better option. I didn’t want to learn a new language (and one notoriously difficult at that) and a new way of testing at the same time. Since then I haven’t written anything else in Haskell for multiple reasons and it’s always been something I’ve been wanting to try out.

I decided that the best way to do it is just to implement property-based testing myself, and I’ve started with preliminary support in my unit-threaded library for basic types and arrays thereof. The main issue was knowing how to write the new tests. It wasn’t clear at all and if you’re in the same situation I highly recommend this talk on the subject. Fortunately, one of the examples in those slides was serialization, and since I wrote a library for that too, I immediately started transitioning some pre-existing unit tests. I have to say that I believe the new tests are much, much better. Here’s one test “on paper” but that actually runs 100 random examples for each of the 17 types mentioned in the code, checking that serializing then deserializing should yield the same value:

@Types!(bool, byte, ubyte, short, ushort, int, uint, long, ulong,
        float, double,
        char, wchar, dchar,
        ubyte[], ushort[], int[])
void testEncodeDecodeProperty(T)() {
    check!((T val) {
        auto enc = Cerealiser();
        enc ~= val;
        auto dec = Decerealiser(enc.bytes);
        return dec.value!T == val;
    });
}

I think that’s a pretty good test/SLOC ratio. Now I just have to find more applications for this.

Am I a Mockist now?

I’ve always considered myself on the classicist side of using test doubles. It just clicked with me I guess, and it didn’t help that I’ve not had good experiences with using mocking frameworks. The first time I tried it in Python I ended up asserting different functions were being called in such a way that my test started mirroring the production code, and at that point I stopped. It was clearly the wrong direction, but I at least I noticed. I had the displeasure once to review a test like that once and had to convince the authors it was a bad idea.

I recently bought and read Jay Fields’s Working Effectively With Unit Tests and was finally exposed to how a mockist goes about writing their tests. I was a bit of an eye-openener: the examples and explanations actually made sense, and I started thinking that maybe mockist thinking wasn’t so bad. What was really surprising to me was that using mocks was described as testing behaviour instead of state: I’d always thought that the downsides of mocks were that they tested implementation instead of behaviour. I started realising that well-written mockist tests just get to the same destination by a different way. I’d just never seen well written tests with mocks before.

Coincidentally, just after I’d finished that book I listened to a podcast interview with J.B. Rainsberger of Integrated Tests are a Scam fame. I heard him say something I’d never heard before: “Don’t mock types you don’t own”. That was also eye-opening: everything that made mocking bad in my eyes suddenly disappeared. If you’re only ever mocking types/functions under your control, then testing implementation isn’t such a bad thing, you can always refactor the implementation of the code you own without tests breaking. No more brittle tests and no more writing test doubles by hand.

I tried it out recently at work and I was actually pleased with the result. Am I a mockist now? I don’t know. Even though I used mocks I ended up with a hybrid approach for some tests, so I guess I’m just trying to use the right tool for the job. Which is always a good thing, right?

Prototyping is useful after all

Python is said to be a great prototyping language, I guess because types don’t have to be written down explicitly. I’ve never really gotten why someone would want to do that. Even after reading Code Complete I felt like it sounded like a good idea but also maybe… a waste of time?

I think I’ve just changed my mind. I’ve been struggling at work for a couple of weeks now with some new C code. It’s not just that it’s in C, there’s also some infra I’d never used before to work out as well. I felt like my head was clouded by implementation details that I had to consider at every step, so I tried something different: write it all, from scratch, in D.

Why D? Two reasons. First, it’s my favourite language. Second, the syntax is similar enough that a translation of the solution to C should be straightforward. One hour later I’d not only implemented everything I wanted, but now knew exactly what I had to do in C-land. Which, of course, will take much longer than an hour.

Still, I’m now wondering why I took so long to prototype my solution. It could’ve saved me two weeks of mostly unproductive work. The problem is too hard to solve in your current environment? Solve it somewhere else where it’s easy and translate later.

The Reader monad is just an object

I found myself writing some code like this a week or two ago:

string testPath = ...;
func1(testPath, ...);
func2(testPath, ...);
...

I thought it was annoying that I had to keep repeating the same argument to every function call and started thinking about ways that I could automatically “thread” it. By just thinking that word I was taken to the world of Haskell and thought that the Reader monad would be great here. I even thought of implementing it in D, until I realised…  that’s just an immutable object! I don’t need a monad, just a struct or a class:

struct Foo {
    string testPath;
    func1(...) const;
    func2(...) const;
}

In most OOP languages I don’t need to pass anything when func1 or func2 are called, and they both implicitly have access to all the instance variables, which is exactly what I wanted to do anyway. In D I can even get something resembling Haskell do notation by writing the client code like this:

with(immutable Foo()) {
    func1(...);
    func2(...);
}

(as opposed to):

auto foo = immutable Foo();
foo.func1(...);
foo.func2(...);

I can’t believe it took me this long to realise it, but the Reader, Writer and State monads are just… objects. Reader is immutable, but I just showed how to get the same effect in D. It’d be similar in C++ with a const object, and the lack of the with keyword would make it more verbose, but in the end it’s very similar.

DConf2016 report

DConf 2016 happened last week in Berlin. As usual it was really cool to be able to attend and talk to so many great developers from around the world.

There weren’t really any earth-shattering announcements. The only thing that comes to mind is for-now anonymous donor offering the D foundation half a million dollars as long as there’s a plan to spend that money. That’s great news and I’m looking forward to hearing what that money will be spent on.

The talks were great, of course: they should be available soon enough so the details aren’t really needed. My favourite was probably Don’s talk on floating point numbers. I’ve read the IEEE spec and I still learned a lot.

And there was my lightning talk. I only found out about the schedule an hour before, which was the amount of time I had to write it. After a stress-filled hour passed, I learned that they didn’t have a VGA cable and had another cortisol spike as I tried to figure out some sort of solution. In the end I ssh’ed from a German Mac into my laptop, to sometimes hilarious consequences, with me ad-libbing with no brain-mouth filter for a few minutes. I need to see the video once it’s available to see how well I did…

Some code you just can’t unit test

My definition of unit tests precludes them from communicating with the outside world via the file system or networking. If your definition is different, your mileage on this post may vary.

I’m a big unit test enthusiast who uses TDD for most code I write. I’m also a firm believer in the testing pyramid, so I consider unit tests to be more important than the slower, flakier, more expensive tests. However, I’ve recently come to the conclusion that obsessing over unit tests to the detriment of the ones higher up the pyramid can be harmful. I guess the clue was “obsessing”.

In two recent projects of mine, I’ve written unit tests that I now consider utterly useless. More than that, I think the codebases of those two projects would be better off without them and that I did myself a disservice by writing them in the first place.

What both of these projects share in common is that they generate code for other programs to consume. In one case, generating build systems in GNU Make or Ninja, and in the other converting from GNU Make to D (so, basically the other direction). This means writing to files, which as mentioned above is a unit test no-no as far as I’m concerned. The typical way to get around this is to write a pure function that returns a string and a very small wrapper function that calls the pure one to write to a file. Now the pure function can be called from unit tests that check the return value. Yay unit tests? Nope.

Add another section to the output? Your tests are broken. Comments? Your tests are broken. Extra newlines? Your tests are broken. In none of these scenarios is the code buggy, and yet, in all of them N tests have to be modified even though the behaviour is the same. In one case I had passing unit tests checking for code generation when the output wouldn’t even compile!

If your program is generating C code, does it really matter what order the arguments to an equals expression are written in? Of course not. So what does matter? That the code compiles and has the intended semantics. It doesn’t matter what the functions and variables you generate are called. Only that if you compile it and you run it, that it does the right thing.

Code that generates output for another program to consume is inherently un-unit-testable. The only way to know it works is to call that program on your output.

In my GNU Make -> D case I was lucky: since I was using D to generate D code, I generated it at compile-time and  mixed it back in to test it so I had my unit test cake and ate it too. Compile times suffered, but I didn’t have to compile and link several little executables to test it. In most other languages, the only way forward would be to pay “the linker price”.

Essentially, my tests were bad because they tested implementation instead of behaviour, which is always a bad idea. I’ll write more about that in a future post.

C++’s killer feature: #include

I don’t think it’s a secret that the main component to C++’s success was that it was (nearly) backwards compatible with C. That made the switch easy, and one could always just extend existing software originally written in C by using  C++. It helped that, at the time, C++ had a feature set not really matched by any of the languages at the time. Abstractions at no cost? Fantastic.

It’s 2016 now, however. Many of the tasks that C++ was usually chosen for can now be done in Java, C#, Go, D, Swift, Rust, … the list goes on. Yet C++ endures. For me, it’s no longer my go-to language for pretty much anything. Unless… I have to call C code.

A few months ago at work, I decided to write my own implementation of certain C APIs in an embedded context in order to make it easy to test the codebase my team is responsible for. I had a quite extensive set of header files, and our C code was calling these APIs. I knew straight away that there was no chance of me picking C for this task, so the question was: which language then? I ended up going with  C++14. Why? #include, that’s why.

Every language under the sun has a way to call C code. It’d be silly not to, really. And it all looks straightforward enough: declare the function’s signature in your language’s syntax and tell it it’s got C linkage, and Bob’s your uncle. Except, of course, that all the examples are passing in ints, floats and const char*. And real life APIs don’t look like that at all.

They need pointers to structs, which are defined in a header that includes a header that includes a header that… Then there are usually macros. You don’t pass a regular int to a function call, you pass a macro call to a macro call (defined in a header that…). Then there’s the case in which macros are part of the API itself. It gets hairy pretty fast.

These days libclang has made it possible to write tools that parse headers and generate the bindings for you. There’s also SWIG. But this means complicated build system setups and they’re not foolproof. If you’ve ever used SWIG, you know there’s a lot of manual work to do.

But in C++…

#include "c_api_with_macros_and_stuff.h"

For me, that’s basically the only use case left for C++. To call C and not have to write C.

Emacs as a C++ IDE: headers

So, headers. Because of backward compatibility and the hardware limitations of when C was created, it’s 2016 and we’re still stuck with them. I doubt that modules will make it into C++17, but even if they do headers aren’t going away any time soon. For one, C++ might still need to call C code and that’s one of the languages killer features: no C bindings needed.

If, like me, you’ve created a package to make Emacs a better C++ environment, they present a challenge. My cmake-ide package actually just organises data to pass to the packages that do heavy lifting, it’s just glue code really. And the data to pass are the compiler flags used for any given file. That way, using libclang it’s possible to find and jump to definitions, get autocomplete information and all that jazz. CMake is kind enought to output a JSON compilation database with every file in the project and the exact command-line used. So it’s a question of parsing the JSON and setting the appropriate variables. Easy peasy.

But… headers. They don’t show up in the compilation database. They shouldn’t – they’re usually not directly compiled, only as a result of being included elsewhere. But where? Unlike Python, Java, or D, there’s no way to know where the source files that include a particular header are in the filesystem. They might be in the same directory. They might be nowhere near. To complicate matters further, the same header file might be compiled with different flags in different translation units. Fun.

What’s a package maintainer to do? In the beginning I punted and took the set of unique compiler flags taken from every flag in the project. The reasoning is that most of the time the compiler flags are the same everywhere anyway. For simple projects that’s true, but I quickly ran into limitations of this approach at work.

A quick and easy fix is to check if there’s an “other” file in Emacs parlance. Essentially this means a Foo.cpp file for a Foo.hpp header. If there is, use its compiler flags. This works, but leaves out the other header files that don’t have a corresponding source file out in the cold. There’s also a runtime cost to pay – if no other file is found it takes several seconds to make sure of that by scouring the file system.

I then looked at all source files in the project sorted by levenshtein distance of their directories to the directory the header file is in. If any of them directly includes the header, use its flags. Unfortunately, this only works for direct includes. In many cases a header is included by another header, which includes another header which…

In the end, I realised the only sure way to go about it is to use compiler-computed dependencies. Unfortunately for me, ninja deletes the .d dependency files when it runs. Fortunately for me, you can ask ninja for all the dependencies in the project.

I haven’t written the code for the CMake Makefile generator yet, but I should soon. ninja already works. I’m going to test it myself in “real life” for a week then release it to the world.