Tag Archives: dlang

API clarity with types

API design is hard. Really hard. It’s one of the reasons I like TDD – it forces you to use the API as a regular client and it usually comes out all the better for it. At a previous job we’d design APIs as C headers, review them without implementation and call it done. Not one of those didn’t have to change as soon as we tried implementing them.

The Win32 API is rife with examples of what not to do: functions with 12 parameters aren’t uncommon. Another API no-no is several parameters of the same type – which means which? This is ok:

auto p = Point(2, 3);

It’s obvious that 2 is the x coordinate and 3 is y. But:

foo("foo", "bar", "baz", "quux", true);

Sure, the actual strings passed don’t help – but what does true mean in this context? Languages like Python get around this by naming arguments at the call site, but that’s not a feature of most curly brace/semicolon languages.

I semi-recently forked and extended the D wrapper for nanomsg. The original C API copies the Berkely sockets API, for reasons I don’t quite understand. That means that a socket must be created, then bound or connect to another socket. In an OOP-ish language we’d like to just have a contructor deal with that for us. Unfortunately, there’s no way to disambiguate if we want to connect to an address or bind to it – in both cases a string is passed. My first attempt was to follow in Java’s footsteps and use static methods for creation (simplified for the blog post):

struct NanoSocket {
    static NanoSocket createBound(string uri) { /* ... */ }
    static NanoSocket createConnected(string uri) { /* ... */ }
    private this() { /* ... */ } // constructor
}

I never did feel comfortable: object creation shouldn’t look *weird*. But I think Haskell has forever changed by brain, so types to the rescue:

struct NanoSocket {
    this(ConnectTo connectTo) { /* ... */ }
    this(BindTo bindTo) { /* ... */ }
}

struct ConnectTo {
    string uri;
}

struct BindTo {
    string uri;
}

I encountered something similar when I implemented a method on NanoSocket called trySend. It takes two durations: a total time to try for, and an interval to wait to try again. Most people would write it like so:

void trySend(ubyte[] data, 
             Duration totalDuration, 
             Duration retryDuration);

At the call site clients might get confused about which order the durations are in. I think this is much better, since there’s no way to get it wrong:

void trySend(ubyte[] data, 
             TotalDuration totalDuration, 
             RetryDuration retryDuration);

struct TotalDuration {
    Duration duration;
}

struct RetryDuration {
    Duration duration;
}

What do you think?

Advertisements
Tagged , , , , , , , ,

C is not magically fast, part 2

I wrote a blog post before about how C is not magically fast, but the sentiment that C has properties lacking in other languages that make it so is still widespread. It was with no surprise at all then that a colleague mentioned something resembling that recently at lunch break, and I attempted to tell him why it wasn’t (at least always) true.

He asked for an example where C++ would be faster, and I resorted to the old sort classic: C++ sort is faster than C’s qsort because of templates and inlining. He then asked me if I’d ever measured it myself, and since I hadn’t, I did just that after lunch. I included D as well because, well, it’s my favourite language. Taking the minimum time after ten runs each to sort a random array of 10M simple structs on my laptop yielded the results below:

  • D: 1.147s
  • C++: 1.723s
  • C: 1.789s

I expected  C++ to be faster than C, I didn’t expect the difference to be so small. I expected D to be the same speed as C++, but for some reason it’s faster. I haven’t investigated the reason why for lack of interest, but maybe because of how strings are handled?

I used the same compiler backend for all 3 so that wouldn’t be an influence: LLVM. I also seeded all of them with the same number and used the same random number generator: the awful srand from C’s standard library. It’s terrible, but it’s the only easy way to do it in standard C and the same function is available from the other two languages. I also only timed the sort, not counting init code.

The code for all 3 implementations:

// sort.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#include <sys/resource.h>

typedef struct {
    int i;
    char* s;
} Foo;

double get_time() {
    struct timeval t;
    struct timezone tzp;
    gettimeofday(&t, &tzp);
    return t.tv_sec + t.tv_usec*1e-6;
}

int comp(const void* lhs_, const void* rhs_) {
    const Foo *lhs = (const Foo*)lhs_;
    const Foo *rhs = (const Foo*)rhs_;
    if(lhs->i < rhs->i) return -1;
    if(lhs->i > rhs->i) return 1;
    return strcmp(lhs->s, rhs->s);
}

int main(int argc, char* argv[]) {
    if(argc < 2) {
        fprintf(stderr, "Must pass in number of elements\n");
        return 1;
    }

    srand(1337);
    const int size = atoi(argv[1]);
    Foo* foos = malloc(size * sizeof(Foo));
    for(int i = 0; i < size; ++i) {
        foos[i].i = rand() % size;
        foos[i].s = malloc(100);
        sprintf(foos[i].s, "foo%dfoo", foos[i].i);
    }

    const double start = get_time();
    qsort(foos, size, sizeof(Foo), comp);
    const double end = get_time();
    printf("Sort time: %lf\n", end - start);

    free(foos);
    return 0;
}


// sort.cpp
#include <iostream>
#include <algorithm>
#include <string>
#include <vector>
#include <chrono>
#include <cstring>

using namespace std;
using namespace chrono;

struct Foo {
    int i;
    string s;

    bool operator<(const Foo& other) const noexcept {
        if(i < other.i) return true;
        if(i > other.i) return false;
        return s < other.s;
    }

};


template<typename CLOCK, typename START>
static double getElapsedSeconds(CLOCK clock, const START start) {
    //cast to ms first to get fractional amount of seconds
    return duration_cast<milliseconds>(clock.now() - start).count() / 1000.0;
}

#include <type_traits>
int main(int argc, char* argv[]) {
    if(argc < 2) {
        cerr << "Must pass in number of elements" << endl;
        return 1;
    }

    srand(1337);
    const int size = stoi(argv[1]);
    vector<Foo> foos(size);
    for(auto& foo: foos) {
        foo.i = rand() % size;
        foo.s = "foo"s + to_string(foo.i) + "foo"s;
    }

    high_resolution_clock clock;
    const auto start = clock.now();
    sort(foos.begin(), foos.end());
    cout << "Sort time: " << getElapsedSeconds(clock, start) << endl;
}


// sort.d
import std.stdio;
import std.exception;
import std.datetime;
import std.algorithm;
import std.conv;
import core.stdc.stdlib;


struct Foo {
    int i;
    string s;

    int opCmp(ref Foo other) const @safe pure nothrow {
        if(i < other.i) return -1;
        if(i > other.i) return 1;
        return s < other.s
            ? -1
            : (s > other.s ? 1 : 0);
    }
}

void main(string[] args) {
    enforce(args.length > 1, "Must pass in number of elements");
    srand(1337);
    immutable size = args[1].to!int;
    auto foos = new Foo[size];
    foreach(ref foo; foos) {
        foo.i = rand % size;
        foo.s = "foo" ~ foo.i.to!string ~ "foo";
    }

    auto sw = StopWatch();
    sw.start;
    sort(foos);
    sw.stop;
    writeln("Elapsed: ", cast(Duration)sw.peek);
}



Tagged , ,

Write custom assertions whenever possible

I’ve been very interested in readable tests with great error messages recently. Mostly because they kept failing and I wanted the most information possible in order to quickly identify the cause. This is another reason why I like TDD: you see the test failing first, so if the error message isn’t great you’ll know straight away instead of months later.

The good testing frameworks provide a way of writing your own custom assertions. I’d never really looked into them that much before, but now I realize the error of my ways. Recently I wrote a test that contained this line:

fileName.exists.shouldBeTrue;

Readable, right? The problem is when it fails:

foo.d:42 - Expected: true
foo.d:42 -      Got: false

And now you have to go read the test and figure out what went wrong. It’s a lot better to get the information that a file was supposed to exist instead right away. So I wrote a custom assertion and was then ready to write this:

fileName.shouldExist;

With the corresponding failure message:

foo.d:42 - Expected /tmp/foo.txt to exist but it didn't

Now it’s a lot easier to pinpoint where the problem is. For starters, you would probably want to start checking the contents of the surrounding directory, having saved the time you would have had to spend figuring out what exactly was false.

Tagged ,

main is just another function

Last week I talked about code that isn’t unit-testable, at least not by my definition of what a unit test is. In keeping with that, this blog post will talk about testing code that has side-effects.

Recently I’d come to accept a defeatist attitude where I couldn’t think of any other way to test that passing certain command-line options to a console binary had a certain effect. I mean, the whole point is to test that running the app differently will have different consequences. As a result I ended up only ever doing end-to-end testing. And… that’s simply not where I want to be.

Then it dawned on me: main is just another function. Granted, it has a special status that makes it so you can’t call it directly from a test, but nearly all my main functions lately have looked like this:

int main(string[] args) {
    try {
        doStuff(args);
        return 0;
    } catch(Exception ex) {
        stderr.writeln(ex.msg);
        return 1;
    }
}

It should be easy enough to translate this to the equivalent C++ in your head. With main so conveniently delegating to a function that does real work, I can now easily write integration tests. After all, is there really any difference between:

doStuff(["myapp", "--option", "arg1", "arg2"]);
// assert stuff happened

And (in, say, a shell script):

./myapp --option arg1 arg2
# assert stuff happened

I’d say no. This way I have one end-to-end test for sanity’s sake, and everything else being tested from the same binary by calling the “real” main function directly.

If your main doesn’t look like the one above, and you happen to be writing C or C++, there’s another technique: use the preprocessor to rename main to something else and call it from your integration/component test. And then, as they say, Bob’s your uncle.

Happy testing!

Tagged , , ,

unit-threaded: now an executable library

It’s one of those ideas that seem obvious in retrospect, but somehow only ocurred to me last week. Let me explain.

I wrote a unit testing library in D called unit-threaded. It uses D’s compile-time reflection capabilities so that no test registration is required. You write your tests, they get found automatically and everything is good and nice. Except… you have to list the files you want to reflect on, explicitly. D’s compiler can’t go reading the filesystem for you while it compiles, so a pre-build step of generating the file list was needed. I wrote a program to do it, but for several reasons it wasn’t ideal.

Now, as someone who actually wants people to use my library (and also to make it easier for myself), I had to find a way so that it would be easy to opt-in to unit-threaded. This is especially important since D has built-in unit tests, so the barrier for entry is low (which is a good thing!). While working on a far crazier idea to make it a no-brainer to use unit-threaded, I stumbled across my current solution: run the library as an executable binary.

The secret sauce that makes this work is dub, D’s package manager. It can download dependencies to compile and even run them with “dub run”. That way, a user need not even have to download it. The other dub feature that makes this feasible is that it supports “configurations” in which a package is built differently. And using those, I can have a regular library configuration and an alternative executable one. Since dub run can take a configuration as an argument, unit-threaded can now be run as a program with “dub run unit-threaded -c gen_ut_main”. And when it is, it generates the file that’s needed to make it all work.

So now all a user need to is add a declaration to their project’s dub.json file and “dub test” works as intended, using unit-threaded underneath, with named unit tests and all of them running in threads by default. Happy days.

Tagged , , , , ,

My first D Improvement Proposal

After over a year of thinking about this (I remember bringing this up at DConf 2014), I finally wrote a DIP about introducing what I call “static inheritance” to D.

The principle is similar to C++ concepts: D already has a way of requiring a certain compile-time interface for templates to be instantiated, the most common being isInputRange. What is lacking right now in my opinion is helpful compiler error messages for when a type was intended to be, e.g. an input range but isn’t due to programmer error.

I tried a library solution first, assuming this would be easier to get accepted than a language change. Since there was nearly 0 interest, I had to write a DIP. Here’s hoping it’s more succesful.

Tagged ,

Haskell actually does change the way you think

Last year I started trying to learn Haskell. There have been many ups and downs, but my only Haskell project so far is on hold while I work on other things. I’m not sure yet if I’d choose to use Haskell in production. The problems I had (and the time it’s taken so far) writing a simple server make me think twice, but that’s a story for another blog post.

The thing is, the whole reason I decided to learn Haskell were the many reports that it made me you think differently. As much as I like D, learning it was easy and essentially I’m using it as a better C++. There are things I routinely do in D that I wouldn’t have thought of or bother in C++ because they’re easier. But it’s not really changed my brain.

I didn’t think Haskell had either, until I started thinking of solutions to problems I was having in D in Haskell ways. I’m currently working on a build system, and since the configuration language is D, it has to be compiled. So I have interesting problems to solve with regards to what runs when: compile-time or run-time. Next thing I know I’m thinking of lazy evaluation, thunks, and the IO monad. Some things aren’t possible to be evaluated at compile-time in D. So I replaced a value with a function that when run (i.e. at run-time) would produce that value. And (modulo current CTFE limitations)… it works! I’m even thinking of making a wrapper type that composes nicely… (sound familiar?)

So, thanks Haskell. You made my head hurt more than anything I’ve tried learning since Physics, but apparently you’ve made me a better programmer.

Tagged , , , , , ,

The craziest code I ever wrote

A few years ago at work my buddy Jeff was as usual trying to do something in Go. I can’t remember why, but he wanted to arrange text strings in memory so that they were all contiguous. I said something about C++ and he remarked that the only thing C++11 could do that Go couldn’t would be perhaps to do this work at compile-time. I hadn’t learned D yet (which would have made the task trivial), so I spent the rest of the day writing the monstrosity below for “teh lulz”. It ended up causing my first ever question on Stackoverflow. “Enjoy” the code:

//Arrange strings contiguously in memory at compile-time from string literals.
//All free functions prefixed with "my" to faciliate grepping the symbol tree
//(none of them should show up).

#include <iostream>

using std::size_t;

//wrapper for const char* to "allocate" space for it at compile-time
template<size_t N>
struct String {
    //C arrays can only be initialised with a comma-delimited list
    //of values in curly braces. Good thing the compiler expands
    //parameter packs into comma-delimited lists. Now we just have
    //to get a parameter pack of char into the constructor.
    template<typename... Args>
    constexpr String(Args... args):_str{ args... } { }
    const char _str[N];
};

//takes variadic number of chars, creates String object from it.
//i.e. myMakeStringFromChars('f', 'o', 'o', '') -> String<4>::_str = "foo"
template<typename... Args>
constexpr auto myMakeStringFromChars(Args... args) -> String<sizeof...(Args)> {
    return String<sizeof...(args)>(args...);
}

//This struct is here just because the iteration is going up instead of
//down. The solution was to mix traditional template metaprogramming
//with constexpr to be able to terminate the recursion since the template
//parameter N is needed in order to return the right-sized String<N>.
//This class exists only to dispatch on the recursion being finished or not.
//The default below continues recursion.
template<bool TERMINATE>
struct RecurseOrStop {
    template<size_t N, size_t I, typename... Args>
    static constexpr String<N> recurseOrStop(const char* str, Args... args);
};

//Specialisation to terminate recursion when all characters have been
//stripped from the string and converted to a variadic template parameter pack.
template<>
struct RecurseOrStop<true> {
    template<size_t N, size_t I, typename... Args>
    static constexpr String<N> recurseOrStop(const char* str, Args... args);
};

//Actual function to recurse over the string and turn it into a variadic
//parameter list of characters.
//Named differently to avoid infinite recursion.
template<size_t N, size_t I = 0, typename... Args>
constexpr String<N> myRecurseOrStop(const char* str, Args... args) {
    //template needed after :: since the compiler needs to distinguish
    //between recurseOrStop being a function template with 2 paramaters
    //or an enum being compared to N (recurseOrStop < N)
    return RecurseOrStop<I == N>::template recurseOrStop<N, I>(str, args...);
}

//implementation of the declaration above
//add a character to the end of the parameter pack and recurse to next character.
template<bool TERMINATE>
template<size_t N, size_t I, typename... Args>
constexpr String<N> RecurseOrStop<TERMINATE>::recurseOrStop(const char* str,
                                                            Args... args) {
    return myRecurseOrStop<N, I + 1>(str, args..., str[I]);
}

//implementation of the declaration above
//terminate recursion and construct string from full list of characters.
template<size_t N, size_t I, typename... Args>
constexpr String<N> RecurseOrStop<true>::recurseOrStop(const char* str,
                                                       Args... args) {
    return myMakeStringFromChars(args...);
}

//takes a compile-time static string literal and returns String<N> from it
//this happens by transforming the string literal into a variadic paramater
//pack of char.
//i.e. myMakeString("foo") -> calls myMakeStringFromChars('f', 'o', 'o', '');
template<size_t N>
constexpr String<N> myMakeString(const char (&str)[N]) {
    return myRecurseOrStop<N>(str);
}

//Simple tuple implementation. The only reason std::tuple isn't being used
//is because its only constexpr constructor is the default constructor.
//We need a constexpr constructor to be able to do compile-time shenanigans,
//and it's easier to roll our own tuple than to edit the standard library code.

//use MyTupleLeaf to construct MyTuple and make sure the order in memory
//is the same as the order of the variadic parameter pack passed to MyTuple.
template<typename T>
struct MyTupleLeaf {
    constexpr MyTupleLeaf(T value):_value(value) { }
    T _value;
};

//Use MyTupleLeaf implementation to define MyTuple.
//Won't work if used with 2 String<> objects of the same size but this
//is just a toy implementation anyway. Multiple inheritance guarantees
//data in the same order in memory as the variadic parameters.
template<typename... Args>
struct MyTuple: public MyTupleLeaf<Args>... {
    constexpr MyTuple(Args... args):MyTupleLeaf<Args>(args)... { }
};

//Helper function akin to std::make_tuple. Needed since functions can deduce
//types from parameter values, but classes can't.
template<typename... Args>
constexpr MyTuple<Args...> myMakeTuple(Args... args) {
    return MyTuple<Args...>(args...);
}

//Takes a variadic list of string literals and returns a tuple of String<> objects.
//These will be contiguous in memory. Trailing '' adds 1 to the size of each string.
//i.e. ("foo", "foobar") -> (const char (&arg1)[4], const char (&arg2)[7]) params ->
//                       ->  MyTuple<String<4>, String<7>> return value
template<size_t... Sizes>
constexpr auto myMakeStrings(const char (&...args)[Sizes]) -> MyTuple<String<Sizes>...> {
    //expands into myMakeTuple(myMakeString(arg1), myMakeString(arg2), ...)
    return myMakeTuple(myMakeString(args)...);
}

//Prints tuple of strings
template<typename T> //just to avoid typing the tuple type of the strings param
void printStrings(const T& strings) {
    //No std::get or any other helpers for MyTuple, so intead just cast it to
    //const char* to explore its layout in memory. We could add iterators to
    //myTuple and do "for(auto data: strings)" for ease of use, but the whole
    //point of this exercise is the memory layout and nothing makes that clearer
    //than the ugly cast below.
    const char* const chars = reinterpret_cast<const char*>(&strings);
    std::cout << "Printing strings of total size " << sizeof(strings);
    std::cout << " bytes:\n";
    std::cout << "-------------------------------\n";

    for(size_t i = 0; i < sizeof(strings); ++i) {
        chars[i] == '' ? std::cout << "\n" : std::cout << chars[i];
    }

    std::cout << "-------------------------------\n";
    std::cout << "\n\n";
}

int main() {
    {
        constexpr auto strings = myMakeStrings("foo", "foobar",
                                               "strings at compile time");
        printStrings(strings);
    }

    {
        constexpr auto strings = myMakeStrings("Some more strings",
                                               "just to show Jeff to not try",
                                               "to challenge C++11 again :P",
                                               "with more",
                                               "to show this is variadic");
        printStrings(strings);
    }

    std::cout << "Running 'objdump -t |grep my' should show that none of the\n";
    std::cout << "functions defined in this file (except printStrings()) are in\n";
    std::cout << "the executable. All computations are done by the compiler at\n";
    std::cout << "compile-time. printStrings() executes at run-time.\n";
}
Tagged , , , , , , ,

Computer languages: ordering my favourites

This isn’t even remotely supposed to be based on facts, evidence, benchmarks or anything like that. You could even disagree with what are “scripting languages” or not. All of the below just reflect my personal preferences. In any case, here’s my list of favourite computer languages, divided into two categories: scripting and, err… I guess “not scripting”.

 

My favourite scripting languages, in order:

  1. Python
  2. Ruby
  3. Emacs Lisp
  4. Lua
  5. Powershell
  6. Perl
  7. bash/zsh
  8. m4
  9. Microsoft batch files
  10. Tcl

 

I haven’t written enough Ruby yet to really know. I suspect I’d like it more than Python but at the moment I just don’t have enough experience with it to know its warts. Even I’m surprised there’s something below Perl here but Tcl really is that bad. If you’re wondering where PHP is, well I don’t know because I’ve never written any but from what I’ve seen and heard I’d expect it to be (in my opinion of course) better than Tcl and worse than Perl. I’m surprised how high Perl is given my extreme dislike for it. When I started thinking about it I realised there’s far far worse.

 

My favourite non-scripting languages, in order:

  1. D
  2. C++
  3. Haskell
  4. Common Lisp
  5. Rust
  6. Java
  7. Go
  8. Objective C
  9. C
  10. Pascal
  11. Fortran
  12. Basic / Visual Basic

I’ve never used Scheme, if that explains where Common Lisp is. I’m still learning Haskell so not too sure there. As for Rust, I’ve never written a line of code in it and yet I think I can confidently place it in the list, especially with respect to Go. It might place higher than C++ but I don’t know yet.

 

Tagged , , , , , , , , , , , , ,

To learn BDD with Cucumber, you must first learn BDD with Cucumber.

So I read about Cucumber a while back and was intrigued, but never had time to properly play with it. While writing my MQTT broker, however, I kept getting annoyed at breaking functionality that wasn’t caught by unit tests. The reason being that the internals were fine, the problems I was creating had to do with the actual business of sending packets. But I was busy so I just dealt with it.

A few weeks ago I read a book about BDD with Cucumber and RSpec but for me it was a bit confusing. The reason being that since the step definitions, unit tests and implementation were all written in Ruby, it was hard for me to distinguish which part was what in the whole BDD/TDD concentric cycles. Even then, I went back to that MQTT project and wrote two Cucumber features (it needs a lot more but since it works I stopped there). These were easy enough to get going: essentially the step definitions run the broker in another process, connect to it over TCP and send packets to it, evaluating if the response was the expected one or not. Pretty cool stuff, and it works! It’s what I should have been doing all along.

So then I started thinking about learning BDD (after all, I wrote the features for MQTT afterwards) by using it on a D project. So I investigated how I could call D code from my step definitions. After spending the better part of an afternoon playing with Thrift and binding Ruby to D, I decided that the best way to go about this was to implement the Cucumber wire protocol. That way a server would listen to JSON requests from Cucumber, call D functions and everything would work. Brilliant.

I was in for a surprise though, me who’s used to implementing protocols after reading an RFC or two. Instead of a usual protocol definition all I had to go on was… Cucumber features! How meta. So I’d use Cucumber to know how to implement my Cucumber server. A word to anyone wanting to do this in another language: there’s hardly any documentation on how to implement the wire protocol. Whenever I got lost and/or confused I just looked at the C++ implementation for guidance. It was there that I found a git submodule with all of Cucumber’s features. Basically, you need to implement all of the “core” features first (therefore ensuring that step definitions actually work), and only then do you get to implement the protocol server itself.

So I wanted to be able to write Cucumber step definitions in D so I could learn and apply BDD to my next project. As it turned out, I learned BDD implementing the wire protocol itself. It took a while to get the hang of transitioning from writing a step definition to unit testing but I think I’m there now. There might be a lot more Cucumber in my future. I might also implement the entirety of Cucumber’s features in D as well, I’m not sure yet.

My implementation is here.

Tagged , , , , , , , , , ,