Category Archives: Programming

Why I find developing on/for Windows exasperating

I ran DOS on my first PC. The natural progession unfolded, with me then running Windows 95, Windows 98, and Windows XP after that (Windows ME, like the Matrix sequels, was a collective bad dream that didn’t really happen). I used Borland’s IDE to write C code, then RHIDE with DJGPP since I couldn’t even imagine using a compiler from the command-line. I say that because I wasn’t “brought up” using *nix at all, and my only exposure was at university. These days however, I do nearly all of my development on Linux. Why? I find it to be a much, much better experience.

Somewhat unfortunately for me, my current job requires me to do Windows development. And every time I boot into Windows or have to fix Windows-specific problems, it makes me want to cry. Why? Let me name some of the reasons why.

Speed, or the lack thereof. I haven’t done a thorough scientific analysis on this, because I don’t think it’d be worth my while to do so. It seems clear to me that NTFS is very very slow. Doing anything on it, from running CMake to compiling to linking, seems to take forever. To the point that it makes me actively wonder how anyone manages to get anything done on Windows. I can rebuild the reference D compiler on my laptop in about 1.6s after modifying one file. On Windows the same build, on the same machine, takes ~1 minute. Given that I find 1.6s infuriatingly slow, you can imagine what sorts of dark swear words I reserve for waiting for a whole minute while what would have been considered a supercomputer a few years ago decides to go get anything done.

Dependencies. Unlike *nix, there is no standard path(s) to look up libraries. Granted, even different Linux distros use different conventions and paths from each other, but libraries are usually installed with a package manager anyway so mostly you don’t care.  And if you did, your linker would find them anyway without the need for extra flags. Need to link to, say, nanomsg on Windows? Good luck with that. Ah, but there’s vcpkg, I hear you say. Apparently Visual Studio auto-magically finds the libraries that vcpkg “installs”. Job done if you’re clicking a button in an IDE, not so much if you’re using a real build system running in CI. It _could_ be just as easy as adding a flag to your linker, but, alas, the .lib files don’t all end up in the same directory. vcpkg allows me to download libraries without having to write Powershell, but then actually linking is, for lack of a better word, “fun”. On Linux? pacman -S nanomsg; ninja

Batch files and/or powershell. I personally find bash horrible to write code in, but then I do Windows work and remember there’s worse. So much worse. Sigh.

Bash. I’ll explain. Git bash is amazing, I remember a time before that existed (I tried, unsucessfully, to compile bash from source for Windows with at least 3 different implementations back in the day). So why am I complaining? First of all, because I use zsh and haven’t seen an easy way to do that yet on Windows. Secondly, because building on Windows from the command-line often requires cmd.exe. Building C++ code? I’m not going to write my own bash version of vcvarsall.bat just to do that. Commands have a habit of spitting out error messages with backslashes (cos, duh, Windows), and good luck copying and pasting that into your bash shell.

Tooling. Want to create a zip? You’ll have to download and install a 3rd party tool. Oh, but the binary doesn’t get added to the PATH, so you’ll have to write out the full path in your batch file and pray one of your machines doesn’t install it to a different location.

Things are better than they used to be on Windows. We now have the Linux subsystem, git bash, and alternatives to the horrible built-in terminal emulator. To me, it just makes things less bad, and the moment I’m back on Arch Linux it feels like coming home from a not particularly good holiday.

Advertisements
Tagged , , ,

Commit failing tests if your framework allows it

In TDD, one is supposed to go through the 3-step cycle of:

  1. Write a failing test
  2. Make it pass
  3. Refactor

The common-sense approach is to not commit the failing test from the first step, since that would thrown a spanner in the works when you inevitably have to bisect your commit DAG trying to figure out where a bug was introduced.

I’ve come to a realisation recently – failing tests should be commited, but only if the testing framework being used allows you to mark failures as successes. For instance, in my D testing framework unit-threaded, I’d commit this silly example:

@ShouldFail("WIP")
unittest {
    1.shouldEqual(2);
}

If you’re not familiar with D, it has built-in unit tests, and unittest is a keyword. @ShouldFail is a User Defined Attribute, part of the library indicating that the unit test it applies to is expected to fail, and allows the user to specify an optional string describing why that’s the case. It could be a bug ID as well.

The test above passes if any of the code in the unittest block throws an exception, i.e. it passes if it fails. This way we can have a single commit of the failing test that motivated the code changes that follow it, and we can’t forget to remove @ShouldFail – in fact, if the programmer implements the feature / fixes the bug correctly, they should expect to see the test suite go red. If that doesn’t happen, either the production code or the test is buggy.

I’m not aware of many frameworks that allow a programmer to do this; pytest has something similar. If yours does, commit your failing tests.

Tagged , ,

On the novelty factor of compile-time duck typing

Or structural type systems for the pendantic, but I think most people know what I mean when I say “compile-time duck typing”.

For one reason or another I’ve read quite a few blog posts about how great the Go programming language is recently. A common refrain is that Go’s interfaces are amazing because you don’t have to declare that a type has to satisfy an interface; it just does if its structure matches (hence structural typing). I’m not sold on how great this actually is – more on that later.

What I don’t understand is how this is presented as novel and never done before. I present to you a language from 1990:

template <typename T>
void fun(const T& animal) {
    cout << "It says: " << animal.say() << endl;
}

struct Dog {
    std::string say() const { return "woof"; }
};

struct Cat {
    std::string say() const { return "meow"; }
};

int main() {
    fun(Dog());
    fun(Cat());
}

Most people would recognise that as being C++. If you didn’t, well… it’s C++. I stayed away from post-C++11 on purpose (i.e. Dog{} instead of Dog()). Look ma, compile-time duck typing in the 1990s! Who’d’ve thunk it?

Is it nicer in Go? In my opinion, yes. Defining an interface and saying a function only takes objects that conform to that interface is a good thing, and a lot better than the situation in C++ (even with std::enable_if and std::void_t). But it’s easy enough to do that in D (template contraints), Haskell (typeclasses), and Rust (traits), to name the languages that do something similar that I’m more familiar with.

But in D and C++, there’s currently no way to state that your type satisfies what you need it to due to an algorithm function requiring it (such as having a member function called “say” in the silly example above) and get compiler errors telling you why it didn’t satisfy it (such as  mispelling “say” as “sey”). C++, at some point in the future, will get concepts exactly to alleviate this. In D, I wrote a library to do it. Traits and typeclasses are definitely better, but in my point of view it’s good to be able to state that a type does indeed “look like” what it needs to do to be used by certain functions. At least in D you can say static assert(isAnimal!MyType); – you just don’t know why that assertion fails when it does. I guess in C++17 one could do something similar using std::void_t. Is there an equivalent for Go? I hope a gopher enlightens me.

All in all I don’t get why this gets touted as something only Go has. It’s a similar story to “you can link statically”. I can do that in other languages as well. Even ones from the 90s.

Tagged , , ,

The main function should be shunned

The main function (in languages that have it) is…. special. It’s the entry point of the program by convention, there can only be one of them in all the object files being linked, and you can’t run a program without it. And it’s inflexible.

Its presence means that the final output has to be an executable. It’s likely however, that the executable in question might have code that others might rather reuse than rewrite, but they won’t be able to use it in their own executables. There’s already a main function in there. Before clang nobody seemed to stumble on the idea that a compiler as a library would be a great idea. And yet…

This is why I’m now advocating for always putting the main function of an executable in its own file, all by itself. And also that it do the least amount of work possible for maximum flexibility. This way, any executable project is one excluded file away in the build system from being used as a library. This is how I’d start a, say, C++ executable project from scratch today:

#include "runtime.hpp"
#include <iostream>
#include <stdexcept>

int main(int argc, const char* argv[]) {
    try {
        run(argc, argv); // "real" main
        return 0;
    } catch(const std::exception& ex) {
        std::cout << "Oops: " << ex.what() << std::endl;
        return 1;
    }
}

In fact, I think I’ll go write an Emacs snippet for that right now.

Tagged ,

API clarity with types

API design is hard. Really hard. It’s one of the reasons I like TDD – it forces you to use the API as a regular client and it usually comes out all the better for it. At a previous job we’d design APIs as C headers, review them without implementation and call it done. Not one of those didn’t have to change as soon as we tried implementing them.

The Win32 API is rife with examples of what not to do: functions with 12 parameters aren’t uncommon. Another API no-no is several parameters of the same type – which means which? This is ok:

auto p = Point(2, 3);

It’s obvious that 2 is the x coordinate and 3 is y. But:

foo("foo", "bar", "baz", "quux", true);

Sure, the actual strings passed don’t help – but what does true mean in this context? Languages like Python get around this by naming arguments at the call site, but that’s not a feature of most curly brace/semicolon languages.

I semi-recently forked and extended the D wrapper for nanomsg. The original C API copies the Berkely sockets API, for reasons I don’t quite understand. That means that a socket must be created, then bound or connect to another socket. In an OOP-ish language we’d like to just have a contructor deal with that for us. Unfortunately, there’s no way to disambiguate if we want to connect to an address or bind to it – in both cases a string is passed. My first attempt was to follow in Java’s footsteps and use static methods for creation (simplified for the blog post):

struct NanoSocket {
    static NanoSocket createBound(string uri) { /* ... */ }
    static NanoSocket createConnected(string uri) { /* ... */ }
    private this() { /* ... */ } // constructor
}

I never did feel comfortable: object creation shouldn’t look *weird*. But I think Haskell has forever changed by brain, so types to the rescue:

struct NanoSocket {
    this(ConnectTo connectTo) { /* ... */ }
    this(BindTo bindTo) { /* ... */ }
}

struct ConnectTo {
    string uri;
}

struct BindTo {
    string uri;
}

I encountered something similar when I implemented a method on NanoSocket called trySend. It takes two durations: a total time to try for, and an interval to wait to try again. Most people would write it like so:

void trySend(ubyte[] data, 
             Duration totalDuration, 
             Duration retryDuration);

At the call site clients might get confused about which order the durations are in. I think this is much better, since there’s no way to get it wrong:

void trySend(ubyte[] data, 
             TotalDuration totalDuration, 
             RetryDuration retryDuration);

struct TotalDuration {
    Duration duration;
}

struct RetryDuration {
    Duration duration;
}

What do you think?

Tagged , , , , , , , ,

Don’t hoard code

For me, the two most important principles in programming are, in order, DRY and YAGNI. Most of my coding decisions ends up respecting one or the other. For some reason YAGNI seems to be less well known. In my experience one tends to get less pushback for DRY – it’s the accepted best practice. But YAGNI seems to need more persuasion, and I’m not entirely sure why.

I’m converted: I love red diffs. I don’t even look at the red sections during code review. Do the tests still pass? Ship it! The thing is that, despite me being a programmer and my “one job” (not really, but you know what I mean) being to write code, I hate code and want the least of it in my project. I mean it.

Code that doesn’t exist is excellent. It doesn’t have to be read, and therefore doesn’t need to be understood, which means it can’t confuse anyone. It doesn’t have bugs. It doesn’t need to be tested. What’s not to like?

And yet, in project after project, one sees code commented out for mostly no good reason. My personal “favourite” (by which I mean I froth at the mouth) is C or  C++ code with #if 0 / #endif pairs. In one project there were even multiple of those, and nested to boot.

Maybe it has to do with not trusting version control. If all you’ve ever used is one of those ancient paid-for systems (not naming any names but you can guess) and have never felt the bliss that is working with git or Mercurial then maybe it’s understandable. Because it might actually be hard to go look at the history and find when you deleted something or why. But these days? No excuse: git grep that_thing_that_I_remember_that_isn’t here_anymore.

And never mind that, in my experience at least, the times anybody goes code spelunking for deleted code are so few and far between that the trade-off is obvious. Code that hasn’t but should be deleted gets in the way. That’s a real cost, paid every day, and for… what? Because someone someday might need that snippet and it takes them an extra minute to find it?

YAGNI. Delete and move on.

C is not magically fast, part 2

I wrote a blog post before about how C is not magically fast, but the sentiment that C has properties lacking in other languages that make it so is still widespread. It was with no surprise at all then that a colleague mentioned something resembling that recently at lunch break, and I attempted to tell him why it wasn’t (at least always) true.

He asked for an example where C++ would be faster, and I resorted to the old sort classic: C++ sort is faster than C’s qsort because of templates and inlining. He then asked me if I’d ever measured it myself, and since I hadn’t, I did just that after lunch. I included D as well because, well, it’s my favourite language. Taking the minimum time after ten runs each to sort a random array of 10M simple structs on my laptop yielded the results below:

  • D: 1.147s
  • C++: 1.723s
  • C: 1.789s

I expected  C++ to be faster than C, I didn’t expect the difference to be so small. I expected D to be the same speed as C++, but for some reason it’s faster. I haven’t investigated the reason why for lack of interest, but maybe because of how strings are handled?

I used the same compiler backend for all 3 so that wouldn’t be an influence: LLVM. I also seeded all of them with the same number and used the same random number generator: the awful srand from C’s standard library. It’s terrible, but it’s the only easy way to do it in standard C and the same function is available from the other two languages. I also only timed the sort, not counting init code.

The code for all 3 implementations:

// sort.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#include <sys/resource.h>

typedef struct {
    int i;
    char* s;
} Foo;

double get_time() {
    struct timeval t;
    struct timezone tzp;
    gettimeofday(&t, &tzp);
    return t.tv_sec + t.tv_usec*1e-6;
}

int comp(const void* lhs_, const void* rhs_) {
    const Foo *lhs = (const Foo*)lhs_;
    const Foo *rhs = (const Foo*)rhs_;
    if(lhs->i < rhs->i) return -1;
    if(lhs->i > rhs->i) return 1;
    return strcmp(lhs->s, rhs->s);
}

int main(int argc, char* argv[]) {
    if(argc < 2) {
        fprintf(stderr, "Must pass in number of elements\n");
        return 1;
    }

    srand(1337);
    const int size = atoi(argv[1]);
    Foo* foos = malloc(size * sizeof(Foo));
    for(int i = 0; i < size; ++i) {
        foos[i].i = rand() % size;
        foos[i].s = malloc(100);
        sprintf(foos[i].s, "foo%dfoo", foos[i].i);
    }

    const double start = get_time();
    qsort(foos, size, sizeof(Foo), comp);
    const double end = get_time();
    printf("Sort time: %lf\n", end - start);

    free(foos);
    return 0;
}


// sort.cpp
#include <iostream>
#include <algorithm>
#include <string>
#include <vector>
#include <chrono>
#include <cstring>

using namespace std;
using namespace chrono;

struct Foo {
    int i;
    string s;

    bool operator<(const Foo& other) const noexcept {
        if(i < other.i) return true;
        if(i > other.i) return false;
        return s < other.s;
    }

};


template<typename CLOCK, typename START>
static double getElapsedSeconds(CLOCK clock, const START start) {
    //cast to ms first to get fractional amount of seconds
    return duration_cast<milliseconds>(clock.now() - start).count() / 1000.0;
}

#include <type_traits>
int main(int argc, char* argv[]) {
    if(argc < 2) {
        cerr << "Must pass in number of elements" << endl;
        return 1;
    }

    srand(1337);
    const int size = stoi(argv[1]);
    vector<Foo> foos(size);
    for(auto& foo: foos) {
        foo.i = rand() % size;
        foo.s = "foo"s + to_string(foo.i) + "foo"s;
    }

    high_resolution_clock clock;
    const auto start = clock.now();
    sort(foos.begin(), foos.end());
    cout << "Sort time: " << getElapsedSeconds(clock, start) << endl;
}


// sort.d
import std.stdio;
import std.exception;
import std.datetime;
import std.algorithm;
import std.conv;
import core.stdc.stdlib;


struct Foo {
    int i;
    string s;

    int opCmp(ref Foo other) const @safe pure nothrow {
        if(i < other.i) return -1;
        if(i > other.i) return 1;
        return s < other.s
            ? -1
            : (s > other.s ? 1 : 0);
    }
}

void main(string[] args) {
    enforce(args.length > 1, "Must pass in number of elements");
    srand(1337);
    immutable size = args[1].to!int;
    auto foos = new Foo[size];
    foreach(ref foo; foos) {
        foo.i = rand % size;
        foo.s = "foo" ~ foo.i.to!string ~ "foo";
    }

    auto sw = StopWatch();
    sw.start;
    sort(foos);
    sw.stop;
    writeln("Elapsed: ", cast(Duration)sw.peek);
}



Tagged , ,

Write custom assertions whenever possible

I’ve been very interested in readable tests with great error messages recently. Mostly because they kept failing and I wanted the most information possible in order to quickly identify the cause. This is another reason why I like TDD: you see the test failing first, so if the error message isn’t great you’ll know straight away instead of months later.

The good testing frameworks provide a way of writing your own custom assertions. I’d never really looked into them that much before, but now I realize the error of my ways. Recently I wrote a test that contained this line:

fileName.exists.shouldBeTrue;

Readable, right? The problem is when it fails:

foo.d:42 - Expected: true
foo.d:42 -      Got: false

And now you have to go read the test and figure out what went wrong. It’s a lot better to get the information that a file was supposed to exist instead right away. So I wrote a custom assertion and was then ready to write this:

fileName.shouldExist;

With the corresponding failure message:

foo.d:42 - Expected /tmp/foo.txt to exist but it didn't

Now it’s a lot easier to pinpoint where the problem is. For starters, you would probably want to start checking the contents of the surrounding directory, having saved the time you would have had to spend figuring out what exactly was false.

Tagged ,

main is just another function

Last week I talked about code that isn’t unit-testable, at least not by my definition of what a unit test is. In keeping with that, this blog post will talk about testing code that has side-effects.

Recently I’d come to accept a defeatist attitude where I couldn’t think of any other way to test that passing certain command-line options to a console binary had a certain effect. I mean, the whole point is to test that running the app differently will have different consequences. As a result I ended up only ever doing end-to-end testing. And… that’s simply not where I want to be.

Then it dawned on me: main is just another function. Granted, it has a special status that makes it so you can’t call it directly from a test, but nearly all my main functions lately have looked like this:

int main(string[] args) {
    try {
        doStuff(args);
        return 0;
    } catch(Exception ex) {
        stderr.writeln(ex.msg);
        return 1;
    }
}

It should be easy enough to translate this to the equivalent C++ in your head. With main so conveniently delegating to a function that does real work, I can now easily write integration tests. After all, is there really any difference between:

doStuff(["myapp", "--option", "arg1", "arg2"]);
// assert stuff happened

And (in, say, a shell script):

./myapp --option arg1 arg2
# assert stuff happened

I’d say no. This way I have one end-to-end test for sanity’s sake, and everything else being tested from the same binary by calling the “real” main function directly.

If your main doesn’t look like the one above, and you happen to be writing C or C++, there’s another technique: use the preprocessor to rename main to something else and call it from your integration/component test. And then, as they say, Bob’s your uncle.

Happy testing!

Tagged , , ,

MVC is really lots of mutable state?

I’ve been doing the Rails tutorial recently. Quickly at first and now a bit slower. The book is really well done, but my interest in implementing a Twitter clone is waning, so I’m just trying to do a little bit every day.

I like all the testability available in Rails. It’s saved me from many mistakes I’ve made while writing the code in the book, which is great. That’s the point of tests.

Ruby is a cool little language. I suspect I like it more than Python, but I just haven’t used Ruby enough to see its warts. Once I do I’ll be able to have an educated opinion.

What I’m really disliking so far though is the amount of mutable state that seems to be needed to get anything done in this framework. The Controller part of MVC doesn’t really control so much as it sets instance variables to be picked up by embedded Ruby code in the HTML view template. That makes me feel… dirty. One of my own quotes is “Mutable state is the root of evil”, so there’s that.

The other thing that’s slightly bugging me about Rails right now is the amount of magic that happens behind the scenes. I love me some automagic: I’m a metaprogramming enthusiast because I like my code to write my code for me. But… I’m more comfortable when knowing how the magic works and what problems it’s solving. Right now, naming things correctly just seem to connect things to each other, it all works, but I have no idea how or why.

Still, it’s an impressive framework. “rails generate” is awesome. And the number of things web developers need to juggle at the same time is impressive.

Tagged , ,