Tag Archives: Go

Operator overloading is a good thing (TM)

Brains are weird things. I used to be a private maths tutor, and I always found it amazing how a little change in notation could sometimes manage to completely confuse a student. Notation itself seems to me to be a major impediment for the majority of people to like or be good at maths. I had fun sometimes replacing the x in an equation with a drawing of an apple to try and get the point across that the actual name (or shape!) of a variable didn’t matter, that it was just standing in for something else.

Programmers are more often than not mathematically inclined, and yet a similar phenomenon seems to occur with the “shape” of certain functions, i.e. operators. For reasons that make us much sense to me as x confusing maths students, the fact that a function has a name that has non-alphanumeric characters in them make them particularly weird. So weird that programmers shouldn’t be allowed to defined functions with those names, only the language designers. That’s always a problem for me – languages that don’t give you the same power as the designers are Blub as far as I’m concerned. But every now and again I see a blost post touting the advantages of some language or other, listing the lack of operator overloading as a bonus.

I don’t even understand the common arguments against operator overloading. One is that somehow “a + b” is now confusing, because it’s not clear what the code does. How is that different from having to read the documentation/implementation of “a.add(b)”? If it’s C++ and “a + b” shows up, anyone who doesn’t read it as “a.operator+(b)” or “operator+(a, b)” with built-in implementations of operator+ for integers and floating point numbers needs to brush up on their C++. And then there’s the fact that that particular operator is overloaded anyway, even in C – the compiler emits different instructions for floats and integers, and its behaviour even depends on the signedness of ints.

Then there’s the complaint that one could make operator+ do something stupid like subtract. Because, you know, this is totally impossible:

int add(int i, int j) {
    return i - j;}

Some would say that operator overloading is limited in applicability since only numerical objects and matrices really need them. But used with care, it might just make sense:

auto path = "foo" / "bar" / "baz";

Or in the C++ ranges by Eric Niebler:

using namespace ranges;
int sum = accumulate(view::ints(1)
                   | view::transform([](int i){return i*i;})
                   | view::take(10), 0);

I’d say both of those previous examples are not only readable, but more readable due to use of operator overloading. As I’ve learned however, readability is in the eye of the beholder.

All in all, it confuses me when I hear/read that lacking operator overloading makes a language simpler. It’s just allowing functions to have “special” names and special syntax to call them (or in Haskell, not even that). Why would the names of functions make code so hard to read for some people? I guess you’d have to ask my old maths students.

Advertisements
Tagged , , , ,

The craziest code I ever wrote

A few years ago at work my buddy Jeff was as usual trying to do something in Go. I can’t remember why, but he wanted to arrange text strings in memory so that they were all contiguous. I said something about C++ and he remarked that the only thing C++11 could do that Go couldn’t would be perhaps to do this work at compile-time. I hadn’t learned D yet (which would have made the task trivial), so I spent the rest of the day writing the monstrosity below for “teh lulz”. It ended up causing my first ever question on Stackoverflow. “Enjoy” the code:

//Arrange strings contiguously in memory at compile-time from string literals.
//All free functions prefixed with "my" to faciliate grepping the symbol tree
//(none of them should show up).

#include <iostream>

using std::size_t;

//wrapper for const char* to "allocate" space for it at compile-time
template<size_t N>
struct String {
    //C arrays can only be initialised with a comma-delimited list
    //of values in curly braces. Good thing the compiler expands
    //parameter packs into comma-delimited lists. Now we just have
    //to get a parameter pack of char into the constructor.
    template<typename... Args>
    constexpr String(Args... args):_str{ args... } { }
    const char _str[N];
};

//takes variadic number of chars, creates String object from it.
//i.e. myMakeStringFromChars('f', 'o', 'o', '') -> String<4>::_str = "foo"
template<typename... Args>
constexpr auto myMakeStringFromChars(Args... args) -> String<sizeof...(Args)> {
    return String<sizeof...(args)>(args...);
}

//This struct is here just because the iteration is going up instead of
//down. The solution was to mix traditional template metaprogramming
//with constexpr to be able to terminate the recursion since the template
//parameter N is needed in order to return the right-sized String<N>.
//This class exists only to dispatch on the recursion being finished or not.
//The default below continues recursion.
template<bool TERMINATE>
struct RecurseOrStop {
    template<size_t N, size_t I, typename... Args>
    static constexpr String<N> recurseOrStop(const char* str, Args... args);
};

//Specialisation to terminate recursion when all characters have been
//stripped from the string and converted to a variadic template parameter pack.
template<>
struct RecurseOrStop<true> {
    template<size_t N, size_t I, typename... Args>
    static constexpr String<N> recurseOrStop(const char* str, Args... args);
};

//Actual function to recurse over the string and turn it into a variadic
//parameter list of characters.
//Named differently to avoid infinite recursion.
template<size_t N, size_t I = 0, typename... Args>
constexpr String<N> myRecurseOrStop(const char* str, Args... args) {
    //template needed after :: since the compiler needs to distinguish
    //between recurseOrStop being a function template with 2 paramaters
    //or an enum being compared to N (recurseOrStop < N)
    return RecurseOrStop<I == N>::template recurseOrStop<N, I>(str, args...);
}

//implementation of the declaration above
//add a character to the end of the parameter pack and recurse to next character.
template<bool TERMINATE>
template<size_t N, size_t I, typename... Args>
constexpr String<N> RecurseOrStop<TERMINATE>::recurseOrStop(const char* str,
                                                            Args... args) {
    return myRecurseOrStop<N, I + 1>(str, args..., str[I]);
}

//implementation of the declaration above
//terminate recursion and construct string from full list of characters.
template<size_t N, size_t I, typename... Args>
constexpr String<N> RecurseOrStop<true>::recurseOrStop(const char* str,
                                                       Args... args) {
    return myMakeStringFromChars(args...);
}

//takes a compile-time static string literal and returns String<N> from it
//this happens by transforming the string literal into a variadic paramater
//pack of char.
//i.e. myMakeString("foo") -> calls myMakeStringFromChars('f', 'o', 'o', '');
template<size_t N>
constexpr String<N> myMakeString(const char (&str)[N]) {
    return myRecurseOrStop<N>(str);
}

//Simple tuple implementation. The only reason std::tuple isn't being used
//is because its only constexpr constructor is the default constructor.
//We need a constexpr constructor to be able to do compile-time shenanigans,
//and it's easier to roll our own tuple than to edit the standard library code.

//use MyTupleLeaf to construct MyTuple and make sure the order in memory
//is the same as the order of the variadic parameter pack passed to MyTuple.
template<typename T>
struct MyTupleLeaf {
    constexpr MyTupleLeaf(T value):_value(value) { }
    T _value;
};

//Use MyTupleLeaf implementation to define MyTuple.
//Won't work if used with 2 String<> objects of the same size but this
//is just a toy implementation anyway. Multiple inheritance guarantees
//data in the same order in memory as the variadic parameters.
template<typename... Args>
struct MyTuple: public MyTupleLeaf<Args>... {
    constexpr MyTuple(Args... args):MyTupleLeaf<Args>(args)... { }
};

//Helper function akin to std::make_tuple. Needed since functions can deduce
//types from parameter values, but classes can't.
template<typename... Args>
constexpr MyTuple<Args...> myMakeTuple(Args... args) {
    return MyTuple<Args...>(args...);
}

//Takes a variadic list of string literals and returns a tuple of String<> objects.
//These will be contiguous in memory. Trailing '' adds 1 to the size of each string.
//i.e. ("foo", "foobar") -> (const char (&arg1)[4], const char (&arg2)[7]) params ->
//                       ->  MyTuple<String<4>, String<7>> return value
template<size_t... Sizes>
constexpr auto myMakeStrings(const char (&...args)[Sizes]) -> MyTuple<String<Sizes>...> {
    //expands into myMakeTuple(myMakeString(arg1), myMakeString(arg2), ...)
    return myMakeTuple(myMakeString(args)...);
}

//Prints tuple of strings
template<typename T> //just to avoid typing the tuple type of the strings param
void printStrings(const T& strings) {
    //No std::get or any other helpers for MyTuple, so intead just cast it to
    //const char* to explore its layout in memory. We could add iterators to
    //myTuple and do "for(auto data: strings)" for ease of use, but the whole
    //point of this exercise is the memory layout and nothing makes that clearer
    //than the ugly cast below.
    const char* const chars = reinterpret_cast<const char*>(&strings);
    std::cout << "Printing strings of total size " << sizeof(strings);
    std::cout << " bytes:\n";
    std::cout << "-------------------------------\n";

    for(size_t i = 0; i < sizeof(strings); ++i) {
        chars[i] == '' ? std::cout << "\n" : std::cout << chars[i];
    }

    std::cout << "-------------------------------\n";
    std::cout << "\n\n";
}

int main() {
    {
        constexpr auto strings = myMakeStrings("foo", "foobar",
                                               "strings at compile time");
        printStrings(strings);
    }

    {
        constexpr auto strings = myMakeStrings("Some more strings",
                                               "just to show Jeff to not try",
                                               "to challenge C++11 again :P",
                                               "with more",
                                               "to show this is variadic");
        printStrings(strings);
    }

    std::cout << "Running 'objdump -t |grep my' should show that none of the\n";
    std::cout << "functions defined in this file (except printStrings()) are in\n";
    std::cout << "the executable. All computations are done by the compiler at\n";
    std::cout << "compile-time. printStrings() executes at run-time.\n";
}
Tagged , , , , , , ,

Adding Java and C++ to the MQTT benchmarks or: How I Learned to Stop Worrying and Love the Garbage Collector

This is a followup to my first post, where I compared different MQTT broker implementations written in D, C, Erlang and Go. Then my colleague who wrote the Erlang version decided to write a Java version too, and I felt compelled to do a C+11 implementation. This was only supposed to simply add the results of those two to the benchmarks but unfortunately had problems with the C++ version, which led to the title of this blog post. More on that later. Suffice it to say for now that the C++ results should be taken with a large lump of salt. Results:

loadtest (throughput - bigger is better)
Connections:         500            750            1k
D + vibe.d:          166.9 +/- 1.5  171.1 +/- 3.3  167.9 +/- 1.3
C (Mosquitto):       122.4 +/- 0.4   95.2 +/- 1.3   74.7 +/- 0.4
Erlang:              124.2 +/- 5.9  117.6 +/- 4.6  117.7 +/- 3.2
Go:                  100.1 +/- 0.1   99.3 +/- 0.2   98.8 +/- 0.3
Java:                105.1 +/- 0.5  105.8 +/- 0.3  105.8 +/- 0.5
C++11 + boost::asio: 109.6 +/- 2.0  107.8 +/- 1.1  108.5 +/- 2.6

pingtest (throughput constrained by latency - bigger is better)
parameters:          400p 20w       200p 200w      100p 400w
D + vibe.d:          50.9 +/- 0.3   38.3 +/- 0.2   20.1 +/- 0.1
C (Mosquitto):       65.4 +/- 4.4   45.2 +/- 0.2   20.0 +/- 0.0
Erlang:              49.1 +/- 0.8   30.9 +/- 0.3   15.6 +/- 0.1
Go:                  45.2 +/- 0.2   27.5 +/- 0.1   16.0 +/- 0.1
Java:                63.9 +/- 0.8   45.7 +/- 0.9   23.9 +/- 0.5
C++11 + boost::asio: 50.8 +/- 0.9   44.2 +/- 0.2   21.5 +/- 0.4

In loadtest the C++ and Java implementations turned out to be in the middle of the pack with comparable performance between the two. Both of them are slightly worse than Erlang and D is still a good distance ahead. In pingtest it gets more interesting: Java mostly matches the previous winner (the C version) and beats it in the last benchmark, so it’s now the clear winner. The C++ version matches both of those in the middle benchmark, does well in the last one but only performs as well as the D version in the first one. A win for Java.

Now about my C++ woes: I brought it on myself a little bit, but the way I approached it was by trying to minimise the amount of work I had to do. After all, writing C++ takes a long while at the best of times so I went and ported it from my D version by translating it by hand. I gleaned a few insights from doing so:

  • Using C++11 made my life a lot easier since it closes the gap with D considerably.  const and immutable became const auto, auto remained the same except when used as a return value, etc.
  • Having also written both C++ and D versions of the serialisation libraries I used as well as the unit-testing ones made things a lot easier, since I used the same concepts and names.
  • I’m glad I took the time to port the unit tests as well. I ended up introducing several bugs in the manual translation.
  • A lot of those bugs were initialisation errors that simply don’t exist in D. Or Java. Or Go. Sigh.
  • I hate headers with a burning passion. Modules should be the top C++17 priority IMHO since there’s zero chance of them making into C++14.
  • I missed slices. A lot. std::vector and std::deque are poor substitutes.
  • Trying to port code written in a garbage collected language and trying to simply introduce std::unique_ptr and std::shared_ptr where appropriate was a massive PITA. I’m not even sure I got it right, more on that below.

The C++ implementation is incomplete and will continue to be like that, since I’m now bored of it, tired, and just want to move on. It’s also buggy. All of the loadtest benchmarks were done with only 1000 messages instead of the values at the top since it crashes if left to run for long enough. I’m not going to debug it because it’s not going to be any fun and nobody is paying me to do it.

It’s not optimised either. I never even bothered to run a profiler. I was going to do it as soon as I fixed all the bugs but I gave up long before that. I know it’s doing excessive copying because copying vectors of bytes around was the easiest way I could get it to compile after copying the D code using slices. It was on my TODO list to remove and replace with iterators, but, as I mentioned, it’s not going to happen.

I reckon a complete version would probably do as well as Java at pingtest but have a hunch that D would probably still win loadtest. This is, of course, pure speculation. So why did I bother to include the C++ results? I thought it would still be interesting and give a rough idea of how it would compare. I wish I had the energy to finish it, but I just wasn’t having fun anymore and I don’t see the point. Writing it from scratch in C++ would have been a better idea, but it definitely would have taken a longer amount of time. It would’ve looked similar to what I have now anyway (I’d still be the author), but I have the feeling it would have fewer bugs. Thinking about memory management from the start is very different from trying to apply smart pointers to an already existing design that depended on a garbage collector.

My conclusion from all of this is that I really don’t want to write C++ again unless I have to. And that for all the misgivings I had about a garbage collector, it saves me time that I would’ve used tracking down memory leaks, double frees and all of those other “fun” activities. And, at least for this exercise, it doesn’t even seem to make a dent in performance. Java was the pingtest winner after all, but its GC is a lot better than D’s. To add insult to C++’s injury, that Java implementation took Patrick a morning to write from scratch, and an afternoon to profile and optimise. It took me days to port an existing working implementation from the closest language there is to C++ and ended up with a crashing binary. It just wasn’t worth the time and effort, but at least now I know that.

Tagged , , , , , , , , , , , , , , , , ,

Go vs D vs Erlang vs C in real life: MQTT broker implementation shootout.

At work we recently started using the MQTT protocol, which uses a publish / subscribe model. It’s simple in the good way and well thought out. We went with an open source implementation named Mosquitto. A few weeks ago on the way back from lunch break my colleague Jeff told me he was writing an MQTT broker in Go, his new favourite language. We’re using MQTT at work, I guess he was looking for a new project to write in Go and voilà. It should be a good fit, after all, this is the type of application that Go was made for. But hubris caught up to him when he uttered “And, of course, it’ll be super fast. It won’t even be fair to other languages”. I’m paraphrasing, but that’s how I remember it. You can read Jeff’s account here.

I’m not a fan of Go at all. I wasn’t particularly impressed when I first read about it but given how much I keep hearing about it on proggit and from Jeff himself, I gave it a go a few months back writing a genetic algorithm framework in it. I came out of that experience liking it even less. It’s just not for me. Go is an opinionated language, which would be fine if I agreed with its creators’ opinions. The way my brain works is that I’m on the opposite side of nearly all of them. It does have a few things I like. The absence of semicolons and parentheses, for instance. Goroutines and channels are a huge win. I can live without exceptions, even though I’d rather not, but generics? They can pry them away from my cold dead hands.

D, on the other hand… now we’re talking. Everything I like about C++ and more, with none of the warts. Of course, it has its own warts too, but nothing’s perfect. So, as a D fan and not so much of a Go one, I took Jeff’s statement as a gauntlet to the face. I learned of vibe.d watching the dconf2013 videos and really liked its idea of a synchronous API on top of asynchronous IO. I was convinced I could at least match a Go implementation’s performance, if not exceed it. So I wrote enough of an MQTT broker implementation to be able to run Jeff’s Go benchmark and compare performances. I reached a version that was faster than his after about 2 days. He came up with a second benchmark and my implementation performed poorly, so I went back to optimising. Around this time another colleague wanted in on the competition and used it as an excuse to learn Erlang, and wrote his own implementation. A few rounds of optimising later, and the results were in, which I’ve included below. Explanations on methodology follow.

 
loadtest (throughput - bigger is better)
Connections:   100            500            750            1k
D + vibe.d:    121.7 +/- 1.5  166.9 +/- 1.5  171.1 +/- 3.3  167.9 +/- 1.3
C (Mosquitto): 106.1 +/- 0.8  122.4 +/- 0.4   95.2 +/- 1.3   74.7 +/- 0.4
Erlang:        104.1 +/- 2.2  124.2 +/- 5.9  117.6 +/- 4.6  117.7 +/- 3.2
Go:             90.9 +/- 11   100.1 +/- 0.1   99.3 +/- 0.2   98.8 +/- 0.3

pingtest (latency - bigger is better)
parameters:    400p 20w       200p 200w      100p 400w
D + vibe.d:    50.9 +/- 0.3   38.3 +/- 0.2   20.1 +/- 0.1
C (Mosquitto): 65.4 +/- 4.4   45.2 +/- 0.2   20.0 +/- 0.0
Erlang:        49.1 +/- 0.8   30.9 +/- 0.3   15.6 +/- 0.1
Go:            45.2 +/- 0.2   27.5 +/- 0.1   16.0 +/- 0.1

All of the numbers are thousands of messages received by the client application per second. All measurements were done on my laptop, a Lenovo W530 running Arch Linux so all of the TCP connections were on localhost. Each number is the mean of several measurements, and I used the standard deviation as an estimate of the systematic error. All of the MQTT broker implementations run in one system thread. Using multiple threads resulted in no performance benefits for latency and worse performance for throughput.

Mosquitto was compiled with gcc 4.8.2, the Go implementation was executed with go run, the D implementation was compiled with dmd 2.0.64.2 and the Erlang version I’m not sure. I installed the Arch Linux erlang package and used my colleague’s Makefile without looking at it.

The two benchmarks are loadtest and pingtest. The former measures throughput whereas the latter measures latency. In loadtest a few hundred connections are set up to the broker. Half of these subscribe to a topic and the other half publishes to that topic as fast as possible. The benchmark ends when all of the subscribers have received a certain number of messages, determined by a command-line argument. I varied the number of connections to see how that would affect each broker. There was no contest here, the D implementation was by far the fastest. With a 100 connections I think there wasn’t enough work to do so that all implementations ended up waiting on IO. Except for Mosquitto, they all scaled rather nicely. I had problems measuring Jeff’s implementation due to a bug. He knows about the bug but just can’t be bothered fixing it. The numbers were taken from Go 1.1 (the pingtest numbers are Go 1.2). When his implementation works, Go 1.2 produces a binary that performs on the order of 10%-15% faster than the numbers above, which might mean equivalent performance to the Erlang implementation. I even think the bug shows up more often in Go 1.2 exactly because the resulting binary is more performant.

In pingtest Jeff tried to write a better benchmark and it measures latency. The two main command-line arguments are the number of connection pairs and the number of wildcard subscribers. For each pair, one of the connections subscribes to a request topic unique to that pair and the partner connection subscribes to a reply topic. One partner publishes a request and waits for the other connection to publish a reply. The number of messages sent per second now depends on the round-trip time between these two. Additionally, the wildcard subscribers receive both the request and reply messages from the first connection pair. The number before the ‘p’ is the number of connection pairs, and the number before the ‘w’ is the number of wildcard subscriber connections. Here Mosquitto is the fastest, but the performance difference diminishes with more wildcards, being on par with the D implementation in the last column. I’m not sure why it’s the fastest. I think there’s a possibility that vibe.d might be switching to the “wrong” fiber but that’s pure speculation on my part.

What about readability and ease of writing? I can’t read Erlang so I can’t comment on that. Despite my preference for D I think the D and Go implementations are equally readable. Since the Erlang unit tests are in the same files as the implementation, it’s hard to know exactly how many lines long it is. It gets worse since it implements most of MQTT, the D implementation essentially only implements what’s necessary to run the benchmarks. With those caveats (and the fact that dependencies aren’t counted) the 3 implementations clock in at somewhere between 800 and 1000 lines, without filtering out blank lines and comments.

Could they be optimised further? Probably. In the end the choice of algorithm and data structures matter more than the programming language so my personal advice is to choose the language that makes you productive. None of them magically made the implementations performant; we all had to profile, analyse, optimise, try ideas and measure. I loved writing it in D, but then again I’m a convert. I particularly enjoyed using the serialisation library I wrote for it, Cerealed. Much of the typical bit twiddling boilerplate in networking code disappeared, and that was only made possible by D’s compile-time reflection and user-defined attributes.

Source:

D: https://github.com/atilaneves/mqtt
C: https://bitbucket.org/oojah/mosquitto/
Go: https://github.com/jeffallen/mqtt
Erlang: https://bitbucket.org/pvalsecc/erlangmqtt
Tagged , , , , , , , , , , ,