Category Archives: Programming

The main function should be shunned

The main function (in languages that have it) is…. special. It’s the entry point of the program by convention, there can only be one of them in all the object files being linked, and you can’t run a program without it. And it’s inflexible.

Its presence means that the final output has to be an executable. It’s likely however, that the executable in question might have code that others might rather reuse than rewrite, but they won’t be able to use it in their own executables. There’s already a main function in there. Before clang nobody seemed to stumble on the idea that a compiler as a library would be a great idea. And yet…

This is why I’m now advocating for always putting the main function of an executable in its own file, all by itself. And also that it do the least amount of work possible for maximum flexibility. This way, any executable project is one excluded file away in the build system from being used as a library. This is how I’d start a, say, C++ executable project from scratch today:

#include "runtime.hpp"
#include <iostream>
#include <stdexcept>

int main(int argc, const char* argv[]) {
    try {
        run(argc, argv); // "real" main
        return 0;
    } catch(const std::exception& ex) {
        std::cout << "Oops: " << ex.what() << std::endl;
        return 1;
    }
}

In fact, I think I’ll go write an Emacs snippet for that right now.

Tagged ,

API clarity with types

API design is hard. Really hard. It’s one of the reasons I like TDD – it forces you to use the API as a regular client and it usually comes out all the better for it. At a previous job we’d design APIs as C headers, review them without implementation and call it done. Not one of those didn’t have to change as soon as we tried implementing them.

The Win32 API is rife with examples of what not to do: functions with 12 parameters aren’t uncommon. Another API no-no is several parameters of the same type – which means which? This is ok:

auto p = Point(2, 3);

It’s obvious that 2 is the x coordinate and 3 is y. But:

foo("foo", "bar", "baz", "quux", true);

Sure, the actual strings passed don’t help – but what does true mean in this context? Languages like Python get around this by naming arguments at the call site, but that’s not a feature of most curly brace/semicolon languages.

I semi-recently forked and extended the D wrapper for nanomsg. The original C API copies the Berkely sockets API, for reasons I don’t quite understand. That means that a socket must be created, then bound or connect to another socket. In an OOP-ish language we’d like to just have a contructor deal with that for us. Unfortunately, there’s no way to disambiguate if we want to connect to an address or bind to it – in both cases a string is passed. My first attempt was to follow in Java’s footsteps and use static methods for creation (simplified for the blog post):

struct NanoSocket {
    static NanoSocket createBound(string uri) { /* ... */ }
    static NanoSocket createConnected(string uri) { /* ... */ }
    private this() { /* ... */ } // constructor
}

I never did feel comfortable: object creation shouldn’t look *weird*. But I think Haskell has forever changed by brain, so types to the rescue:

struct NanoSocket {
    this(ConnectTo connectTo) { /* ... */ }
    this(BindTo bindTo) { /* ... */ }
}

struct ConnectTo {
    string uri;
}

struct BindTo {
    string uri;
}

I encountered something similar when I implemented a method on NanoSocket called trySend. It takes two durations: a total time to try for, and an interval to wait to try again. Most people would write it like so:

void trySend(ubyte[] data, 
             Duration totalDuration, 
             Duration retryDuration);

At the call site clients might get confused about which order the durations are in. I think this is much better, since there’s no way to get it wrong:

void trySend(ubyte[] data, 
             TotalDuration totalDuration, 
             RetryDuration retryDuration);

struct TotalDuration {
    Duration duration;
}

struct RetryDuration {
    Duration duration;
}

What do you think?

Tagged , , , , , , , ,

Don’t hoard code

For me, the two most important principles in programming are, in order, DRY and YAGNI. Most of my coding decisions ends up respecting one or the other. For some reason YAGNI seems to be less well known. In my experience one tends to get less pushback for DRY – it’s the accepted best practice. But YAGNI seems to need more persuasion, and I’m not entirely sure why.

I’m converted: I love red diffs. I don’t even look at the red sections during code review. Do the tests still pass? Ship it! The thing is that, despite me being a programmer and my “one job” (not really, but you know what I mean) being to write code, I hate code and want the least of it in my project. I mean it.

Code that doesn’t exist is excellent. It doesn’t have to be read, and therefore doesn’t need to be understood, which means it can’t confuse anyone. It doesn’t have bugs. It doesn’t need to be tested. What’s not to like?

And yet, in project after project, one sees code commented out for mostly no good reason. My personal “favourite” (by which I mean I froth at the mouth) is C or  C++ code with #if 0 / #endif pairs. In one project there were even multiple of those, and nested to boot.

Maybe it has to do with not trusting version control. If all you’ve ever used is one of those ancient paid-for systems (not naming any names but you can guess) and have never felt the bliss that is working with git or Mercurial then maybe it’s understandable. Because it might actually be hard to go look at the history and find when you deleted something or why. But these days? No excuse: git grep that_thing_that_I_remember_that_isn’t here_anymore.

And never mind that, in my experience at least, the times anybody goes code spelunking for deleted code are so few and far between that the trade-off is obvious. Code that hasn’t but should be deleted gets in the way. That’s a real cost, paid every day, and for… what? Because someone someday might need that snippet and it takes them an extra minute to find it?

YAGNI. Delete and move on.

C is not magically fast, part 2

I wrote a blog post before about how C is not magically fast, but the sentiment that C has properties lacking in other languages that make it so is still widespread. It was with no surprise at all then that a colleague mentioned something resembling that recently at lunch break, and I attempted to tell him why it wasn’t (at least always) true.

He asked for an example where C++ would be faster, and I resorted to the old sort classic: C++ sort is faster than C’s qsort because of templates and inlining. He then asked me if I’d ever measured it myself, and since I hadn’t, I did just that after lunch. I included D as well because, well, it’s my favourite language. Taking the minimum time after ten runs each to sort a random array of 10M simple structs on my laptop yielded the results below:

  • D: 1.147s
  • C++: 1.723s
  • C: 1.789s

I expected  C++ to be faster than C, I didn’t expect the difference to be so small. I expected D to be the same speed as C++, but for some reason it’s faster. I haven’t investigated the reason why for lack of interest, but maybe because of how strings are handled?

I used the same compiler backend for all 3 so that wouldn’t be an influence: LLVM. I also seeded all of them with the same number and used the same random number generator: the awful srand from C’s standard library. It’s terrible, but it’s the only easy way to do it in standard C and the same function is available from the other two languages. I also only timed the sort, not counting init code.

The code for all 3 implementations:

// sort.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#include <sys/resource.h>

typedef struct {
    int i;
    char* s;
} Foo;

double get_time() {
    struct timeval t;
    struct timezone tzp;
    gettimeofday(&t, &tzp);
    return t.tv_sec + t.tv_usec*1e-6;
}

int comp(const void* lhs_, const void* rhs_) {
    const Foo *lhs = (const Foo*)lhs_;
    const Foo *rhs = (const Foo*)rhs_;
    if(lhs->i < rhs->i) return -1;
    if(lhs->i > rhs->i) return 1;
    return strcmp(lhs->s, rhs->s);
}

int main(int argc, char* argv[]) {
    if(argc < 2) {
        fprintf(stderr, "Must pass in number of elements\n");
        return 1;
    }

    srand(1337);
    const int size = atoi(argv[1]);
    Foo* foos = malloc(size * sizeof(Foo));
    for(int i = 0; i < size; ++i) {
        foos[i].i = rand() % size;
        foos[i].s = malloc(100);
        sprintf(foos[i].s, "foo%dfoo", foos[i].i);
    }

    const double start = get_time();
    qsort(foos, size, sizeof(Foo), comp);
    const double end = get_time();
    printf("Sort time: %lf\n", end - start);

    free(foos);
    return 0;
}


// sort.cpp
#include <iostream>
#include <algorithm>
#include <string>
#include <vector>
#include <chrono>
#include <cstring>

using namespace std;
using namespace chrono;

struct Foo {
    int i;
    string s;

    bool operator<(const Foo& other) const noexcept {
        if(i < other.i) return true;
        if(i > other.i) return false;
        return s < other.s;
    }

};


template<typename CLOCK, typename START>
static double getElapsedSeconds(CLOCK clock, const START start) {
    //cast to ms first to get fractional amount of seconds
    return duration_cast<milliseconds>(clock.now() - start).count() / 1000.0;
}

#include <type_traits>
int main(int argc, char* argv[]) {
    if(argc < 2) {
        cerr << "Must pass in number of elements" << endl;
        return 1;
    }

    srand(1337);
    const int size = stoi(argv[1]);
    vector<Foo> foos(size);
    for(auto& foo: foos) {
        foo.i = rand() % size;
        foo.s = "foo"s + to_string(foo.i) + "foo"s;
    }

    high_resolution_clock clock;
    const auto start = clock.now();
    sort(foos.begin(), foos.end());
    cout << "Sort time: " << getElapsedSeconds(clock, start) << endl;
}


// sort.d
import std.stdio;
import std.exception;
import std.datetime;
import std.algorithm;
import std.conv;
import core.stdc.stdlib;


struct Foo {
    int i;
    string s;

    int opCmp(ref Foo other) const @safe pure nothrow {
        if(i < other.i) return -1;
        if(i > other.i) return 1;
        return s < other.s
            ? -1
            : (s > other.s ? 1 : 0);
    }
}

void main(string[] args) {
    enforce(args.length > 1, "Must pass in number of elements");
    srand(1337);
    immutable size = args[1].to!int;
    auto foos = new Foo[size];
    foreach(ref foo; foos) {
        foo.i = rand % size;
        foo.s = "foo" ~ foo.i.to!string ~ "foo";
    }

    auto sw = StopWatch();
    sw.start;
    sort(foos);
    sw.stop;
    writeln("Elapsed: ", cast(Duration)sw.peek);
}



Tagged , ,

Write custom assertions whenever possible

I’ve been very interested in readable tests with great error messages recently. Mostly because they kept failing and I wanted the most information possible in order to quickly identify the cause. This is another reason why I like TDD: you see the test failing first, so if the error message isn’t great you’ll know straight away instead of months later.

The good testing frameworks provide a way of writing your own custom assertions. I’d never really looked into them that much before, but now I realize the error of my ways. Recently I wrote a test that contained this line:

fileName.exists.shouldBeTrue;

Readable, right? The problem is when it fails:

foo.d:42 - Expected: true
foo.d:42 -      Got: false

And now you have to go read the test and figure out what went wrong. It’s a lot better to get the information that a file was supposed to exist instead right away. So I wrote a custom assertion and was then ready to write this:

fileName.shouldExist;

With the corresponding failure message:

foo.d:42 - Expected /tmp/foo.txt to exist but it didn't

Now it’s a lot easier to pinpoint where the problem is. For starters, you would probably want to start checking the contents of the surrounding directory, having saved the time you would have had to spend figuring out what exactly was false.

Tagged ,

main is just another function

Last week I talked about code that isn’t unit-testable, at least not by my definition of what a unit test is. In keeping with that, this blog post will talk about testing code that has side-effects.

Recently I’d come to accept a defeatist attitude where I couldn’t think of any other way to test that passing certain command-line options to a console binary had a certain effect. I mean, the whole point is to test that running the app differently will have different consequences. As a result I ended up only ever doing end-to-end testing. And… that’s simply not where I want to be.

Then it dawned on me: main is just another function. Granted, it has a special status that makes it so you can’t call it directly from a test, but nearly all my main functions lately have looked like this:

int main(string[] args) {
    try {
        doStuff(args);
        return 0;
    } catch(Exception ex) {
        stderr.writeln(ex.msg);
        return 1;
    }
}

It should be easy enough to translate this to the equivalent C++ in your head. With main so conveniently delegating to a function that does real work, I can now easily write integration tests. After all, is there really any difference between:

doStuff(["myapp", "--option", "arg1", "arg2"]);
// assert stuff happened

And (in, say, a shell script):

./myapp --option arg1 arg2
# assert stuff happened

I’d say no. This way I have one end-to-end test for sanity’s sake, and everything else being tested from the same binary by calling the “real” main function directly.

If your main doesn’t look like the one above, and you happen to be writing C or C++, there’s another technique: use the preprocessor to rename main to something else and call it from your integration/component test. And then, as they say, Bob’s your uncle.

Happy testing!

Tagged , , ,

MVC is really lots of mutable state?

I’ve been doing the Rails tutorial recently. Quickly at first and now a bit slower. The book is really well done, but my interest in implementing a Twitter clone is waning, so I’m just trying to do a little bit every day.

I like all the testability available in Rails. It’s saved me from many mistakes I’ve made while writing the code in the book, which is great. That’s the point of tests.

Ruby is a cool little language. I suspect I like it more than Python, but I just haven’t used Ruby enough to see its warts. Once I do I’ll be able to have an educated opinion.

What I’m really disliking so far though is the amount of mutable state that seems to be needed to get anything done in this framework. The Controller part of MVC doesn’t really control so much as it sets instance variables to be picked up by embedded Ruby code in the HTML view template. That makes me feel… dirty. One of my own quotes is “Mutable state is the root of evil”, so there’s that.

The other thing that’s slightly bugging me about Rails right now is the amount of magic that happens behind the scenes. I love me some automagic: I’m a metaprogramming enthusiast because I like my code to write my code for me. But… I’m more comfortable when knowing how the magic works and what problems it’s solving. Right now, naming things correctly just seem to connect things to each other, it all works, but I have no idea how or why.

Still, it’s an impressive framework. “rails generate” is awesome. And the number of things web developers need to juggle at the same time is impressive.

Tagged , ,

Web Dev

I’ve pretty much always been a systems programmer. These days most of what I see on programming blogs and the like are related to web development somehow, and it makes sense. From mobile to actual websites, this is how most things are shipped. People buying software to run on their desktop computers is, like, so 20th century.

I figured this was a gaping hole in my CV so I’ve been meaning to dip my toes in for quite a while now. I unexpectedly “sorta kinda” finished all the personal projects I wanted to work on and found myself girlfriend-less for the weekend and now I’ve gone through half the chapters of the Ruby on Rails Tutorial book. It’s really well written, I recommend it. I’ve been fascinated by the journey.

There are a lot of moving parts in web development, it turns out. Even though I haven’t written a website from scratch, the sheer number of directories and hints the books drops about the work Rails does for you is amazing. I know what goes into talking to a database – it’s incredible how easy it all is.

As soon as I’m done with the tutorial, I just need to think up a cool personal project that a website would be appropriate for. Also, I might finally write enough Ruby code to be able to make an informed comparison with Python. I think I like Ruby better, but I just haven’t written enough code in it yet.

Tagged , , ,

unit-threaded: now an executable library

It’s one of those ideas that seem obvious in retrospect, but somehow only ocurred to me last week. Let me explain.

I wrote a unit testing library in D called unit-threaded. It uses D’s compile-time reflection capabilities so that no test registration is required. You write your tests, they get found automatically and everything is good and nice. Except… you have to list the files you want to reflect on, explicitly. D’s compiler can’t go reading the filesystem for you while it compiles, so a pre-build step of generating the file list was needed. I wrote a program to do it, but for several reasons it wasn’t ideal.

Now, as someone who actually wants people to use my library (and also to make it easier for myself), I had to find a way so that it would be easy to opt-in to unit-threaded. This is especially important since D has built-in unit tests, so the barrier for entry is low (which is a good thing!). While working on a far crazier idea to make it a no-brainer to use unit-threaded, I stumbled across my current solution: run the library as an executable binary.

The secret sauce that makes this work is dub, D’s package manager. It can download dependencies to compile and even run them with “dub run”. That way, a user need not even have to download it. The other dub feature that makes this feasible is that it supports “configurations” in which a package is built differently. And using those, I can have a regular library configuration and an alternative executable one. Since dub run can take a configuration as an argument, unit-threaded can now be run as a program with “dub run unit-threaded -c gen_ut_main”. And when it is, it generates the file that’s needed to make it all work.

So now all a user need to is add a declaration to their project’s dub.json file and “dub test” works as intended, using unit-threaded underneath, with named unit tests and all of them running in threads by default. Happy days.

Tagged , , , , ,

The C++ GSL in Practice

At CppCon 2015, we heard about the CppCoreGuildelines and a supporting library for it, the GSL. There were several talks devoted to this, including two of the keynotes, and we were promised a future of zero cost abstractions that were also safe. What’s not to like?

Me being me, I had to try this out for myself. And what better way than when rewriting my C++ implementation of an MQTT broker from scratch. Why from scratch? The version I had didn’t perform well, required extensive refactoring to do so and I’m not crazy enough to post results from C++ that lose by a factor of 3 to any other language.

It was a good fit as well: the equivalent D and Rust code was using slices, so this seemed like the perfect change to try out gsl::span (née gsl::array_view).

I think I liked it. I say I think because the benefits it provided (slices in C++!) are something I’m used to now by programming in D, and of course there were a few things that didn’t work out so well, namely:

gsl::cstring_span

First of all, there was this bug I filed. This is a new one to shoot oneself in one’s foot and we were not amused. Had I just declared a function taking const std::string& as usual, I wouldn’t have hit the bug. The price of early adoption, I guess. The worst part is that it failed silently and was hard to detect: the strings printed out the same, but one had a silent terminating null char. I ended up having to declare an overload that took const char* and did the conversion appropriately.

Also, although I know why, it’s still incredibly annoying to have to use empty angle brackets for the default case.

Rvalues need not apply

Without using the GSL, I can do this:

void func(const std::vector<unsigned char>&);
func({2, 3, 4}); //rvalues are nice

With the GSL, it has to be this:

void func(gsl::span<const unsigned char>&);
const std::vector<unsigned char> bytes{2, 3, 4};
func(bytes);

It’s cumbersome and I can’t see how it’s protecting me from anything.

Documentation

I had to refer to the unit tests (fortunately included) and Neil MacIntosh’s presentation at CppCon 2015 multiple times to figure out how to use it. It wasn’t always obvious.

Conclusion

I still think this is a good thing for C++, but the value of something like gsl::not_null is… null without the static analysis tool they mentioned. It could be easier to use as well. My other concern is how and if gsl::span will work with the ranges proposal / library.

 

Tagged , ,