Monthly Archives: August 2015

C is not magically fast

I always wonder why people think there’s a mystic quality to C that makes it insanely fast. It’s just assumed to be true by many people, and I really can’t understand why in the face of vast amounts of evidence to the contrary and well, logical reasoning. Just last week a blog post expressed surprise at a benchmark they’d conducted themselves because (paraphrased) “in theory, the C implementation should be faster”.

There are so many problems with this belief. First of all, the fact that Fortran has been beating C since… before C existed, given that Fortrain is the oldest language still in use (depending on your definitions of “old” and “in use”). I guess some people don’t know Fortrain is still important in HPC today?

Secondly, there are languages that are a superset, or close enough, of C, such as C++ and D. Even if somehow, like red vehicles, C were always faster, all one would have to do is rewrite the bottlenecks in C syntax. The AST would be the same, and if using the same backend (gcc or clang, say), why would the assembly output be any different?

Thirdly, and much more importantly, the lack of measurements to accompany that belief or ignoring existing ones. What else can you say?

Maybe part of it is the fact that writing C is so slow and painful. There has to be a reward for it, right? The truth however, is likely to be much closer to doing crunches: annoying, time-consuming, and won’t even give you abs.

On debuggers and printfs

I, like pretty much every other human on the planet, like to over-generalize. In keeping with tradition, there seems to be another schism amongst programmers, which is the ones that use printfs to debug and the ones that use debuggers.

Debuggers are magical things, they basically give you a REPL for natively compiled languages that don’t have one. I think every programmer should get acquainted with them, to least know what they can do. Write printfs if you want or must, but at least be aware what’s on offer from “the other side”.

As much as I like debuggers however, sometimes I find it easier to do log-based debugging. Sometimes it’s even necessary. The last one I worked on at work was a timing issue; breakpoints aren’t going to help with that.

Line coverage isn’t as important as most people think

Controversial, I know. Let me start by saying this: I think line coverage tools are useful and should be used. But I think most people get a false sense of security by shooting for what I think are meaningless metrics such as achieving x% line coverage.

One of the problems is coupling: good tests aren’t coupled to the implementation code and one should be free to change the implementation completely without breaking any tests. But line coverage is a measurement of supposed test quality that is completely dependent on the implementation! If that doesn’t sound alarm bells, it should. I could replace the implementation and the line coverage would probably change. Did the quality of my tests change? Obviously not.

Another problem I’ve seen is that code coverage metrics cause people to write “unit tests” for utility functions that print out data structures. There are no assertions in the “test” at all, all it does is call the code in question to get a better metric. Is that really providing a stronger guarantee that the software works as intended? Beware the cobra effect, and, as nearly always, Dilbert has something to say about the danger of introducing metrics and encouraging engineers to make them better.

Last week at work I encountered yet another real-life example of how pursuing code coverage by itself can be fruitless endeavour. I wrote a UT for a C function that was something like this:

int func(struct Foo* foo, struct Bar* bar);

So I started out with my valgrind-driven development and ended up filling up both structs with suitable values, and all the test did was assert on the return value. I looked at the line coverage and the function was nearly 100% covered. Great, right? No. The issue is that, in typical C fashion, the return code in this function in particular wasn’t nearly as interesting as the side effects to the passed-in structs, and I hadn’t checked those at all. Despite not having tested that the function actually did what it was supposed to, I had nearly 100% line coverage. By that metric alone my test was great. By the metric of preventing bugs… not so much.

So what is line coverage good for? In my opinion, identifying the gaps in your testing. Do you really care if no test calls your util_print function? Probably not, so seeing that as not covered is ok. Any if statement that isn’t entered (or else clause) however… you probably want to take a look at that. I tend to do TDD myself, so my line coverage is high just because lines of code don’t get written unless there’s an accompanying test. Sometimes I forget to test certain inputs, and the line coverage report lets me know I have to write a few more tests. But depending on it as a metric and seeing higher coverage in and of itself as a goal? That’s not something I believe in. At the end of the day, code with 100% coverage still has bugs. The important thing is to identify techniques for reducing the probability of writing buggy code or introducing them into code that worked.

If you’re interested in a thoughtful analysis of code coverage, I really enjoyed this article.

Binary serialisation made even easier: no boilerplate with Cerealed

I abhor boilerplate code and duplication. My main governing principle in programming is DRY. I tend not to like languages that made me repeat myself a lot.

So when I wrote a serialisation library, I used it in real-life programs and made sure I didn’t have to repeat myself as I had to when I wrote similar code in C++. The fact that D allows me to get the compiler to write so much code for me, easily, is probably the main reason why it’s my favourite language.

But there were still patterns of emerging in the networking code I wrote that started to annoy me. If I’m reaching for nearly identical code for completely different networking protocol packets, then there’s a level of abstraction I should be using but failed to notice beforehand. The patterns are very common in networking, and that’s when part of the packet contains either the total length in bytes, or length minus a header size, or “number of blocks to follow”. This happens even in something as simple as UDP. With the previous version of Cerealed, you’d have to do this:

struct UdpPacket {
    ushort srcPort;
    ushort dstPort;
    ushort length;
    ushort checksum;
    @NoCereal ubyte[] data; //won't be automatically (de)serialised

    //this function is run automatically at the end of (de)serialisation
    void postBlit(C)(ref C cereal) if(isCereal!C) {
        int headerSize = srcPort.sizeOf + dstPort.sizeof + length.sizeof + checksum.sizeof;
        cereal.grainLengthedArray(data, length - headerSize); //(de)serialise it now

Which, quite frankly, is much better than what usually needs to be done in most other languages / frameworks. This was still too boilerplatey for me and got got old fast. So now it’s:

struct UdpPacket {
    //there's an easier way to calculate headerSize but explaining how'd make the blog post longer
    enum headerSize = srcPort.sizeOf + dstPort.sizeof + length.sizeof + checksum.sizeof; 
    ushort srcPort;
    ushort dstPort;
    ushort length; 
    ushort checksum; 
    @LengthInBytes("length - headerSize") ubyte[] data; 
    //code? We don't need code 

That “length – headerSize” in @LengthInBytes? That’s not a generic name, that’s a compile-time string that refers to the member variable and manifest constant (enum) declared in the struct. The code to actually do the necessary logic is generated at compile-time from the struct declaration itself.

Why write code when I can get the compiler to do it for me? Now try doing that at compile-time in any other language! Well, except Lisp. The answer to “can I do ___ in Lisp” is invariably yes. And yet, not always this easily.

Cerealed is on github and is available as a dub package.