C is not magically fast, part 2

I wrote a blog post before about how C is not magically fast, but the sentiment that C has properties lacking in other languages that make it so is still widespread. It was with no surprise at all then that a colleague mentioned something resembling that recently at lunch break, and I attempted to tell him why it wasn’t (at least always) true.

He asked for an example where C++ would be faster, and I resorted to the old sort classic: C++ sort is faster than C’s qsort because of templates and inlining. He then asked me if I’d ever measured it myself, and since I hadn’t, I did just that after lunch. I included D as well because, well, it’s my favourite language. Taking the minimum time after ten runs each to sort a random array of 10M simple structs on my laptop yielded the results below:

  • D: 1.147s
  • C++: 1.723s
  • C: 1.789s

I expected  C++ to be faster than C, I didn’t expect the difference to be so small. I expected D to be the same speed as C++, but for some reason it’s faster. I haven’t investigated the reason why for lack of interest, but maybe because of how strings are handled?

I used the same compiler backend for all 3 so that wouldn’t be an influence: LLVM. I also seeded all of them with the same number and used the same random number generator: the awful srand from C’s standard library. It’s terrible, but it’s the only easy way to do it in standard C and the same function is available from the other two languages. I also only timed the sort, not counting init code.

The code for all 3 implementations:

// sort.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#include <sys/resource.h>

typedef struct {
    int i;
    char* s;
} Foo;

double get_time() {
    struct timeval t;
    struct timezone tzp;
    gettimeofday(&t, &tzp);
    return t.tv_sec + t.tv_usec*1e-6;
}

int comp(const void* lhs_, const void* rhs_) {
    const Foo *lhs = (const Foo*)lhs_;
    const Foo *rhs = (const Foo*)rhs_;
    if(lhs->i < rhs->i) return -1;
    if(lhs->i > rhs->i) return 1;
    return strcmp(lhs->s, rhs->s);
}

int main(int argc, char* argv[]) {
    if(argc < 2) {
        fprintf(stderr, "Must pass in number of elements\n");
        return 1;
    }

    srand(1337);
    const int size = atoi(argv[1]);
    Foo* foos = malloc(size * sizeof(Foo));
    for(int i = 0; i < size; ++i) {
        foos[i].i = rand() % size;
        foos[i].s = malloc(100);
        sprintf(foos[i].s, "foo%dfoo", foos[i].i);
    }

    const double start = get_time();
    qsort(foos, size, sizeof(Foo), comp);
    const double end = get_time();
    printf("Sort time: %lf\n", end - start);

    free(foos);
    return 0;
}


// sort.cpp
#include <iostream>
#include <algorithm>
#include <string>
#include <vector>
#include <chrono>
#include <cstring>

using namespace std;
using namespace chrono;

struct Foo {
    int i;
    string s;

    bool operator<(const Foo& other) const noexcept {
        if(i < other.i) return true;
        if(i > other.i) return false;
        return s < other.s;
    }

};


template<typename CLOCK, typename START>
static double getElapsedSeconds(CLOCK clock, const START start) {
    //cast to ms first to get fractional amount of seconds
    return duration_cast<milliseconds>(clock.now() - start).count() / 1000.0;
}

#include <type_traits>
int main(int argc, char* argv[]) {
    if(argc < 2) {
        cerr << "Must pass in number of elements" << endl;
        return 1;
    }

    srand(1337);
    const int size = stoi(argv[1]);
    vector<Foo> foos(size);
    for(auto& foo: foos) {
        foo.i = rand() % size;
        foo.s = "foo"s + to_string(foo.i) + "foo"s;
    }

    high_resolution_clock clock;
    const auto start = clock.now();
    sort(foos.begin(), foos.end());
    cout << "Sort time: " << getElapsedSeconds(clock, start) << endl;
}


// sort.d
import std.stdio;
import std.exception;
import std.datetime;
import std.algorithm;
import std.conv;
import core.stdc.stdlib;


struct Foo {
    int i;
    string s;

    int opCmp(ref Foo other) const @safe pure nothrow {
        if(i < other.i) return -1;
        if(i > other.i) return 1;
        return s < other.s
            ? -1
            : (s > other.s ? 1 : 0);
    }
}

void main(string[] args) {
    enforce(args.length > 1, "Must pass in number of elements");
    srand(1337);
    immutable size = args[1].to!int;
    auto foos = new Foo[size];
    foreach(ref foo; foos) {
        foo.i = rand % size;
        foo.s = "foo" ~ foo.i.to!string ~ "foo";
    }

    auto sw = StopWatch();
    sw.start;
    sort(foos);
    sw.stop;
    writeln("Elapsed: ", cast(Duration)sw.peek);
}



Advertisements
Tagged , ,

15 thoughts on “C is not magically fast, part 2

  1. KrzaQ says:

    Why did you use different data types between languages? C’s Foo seems to be more cache-friendly, but on the other hand, swapping its elements is way more expensive. All in all, it feels like comparing apples to pears.

    As far as C++ and D go, I’d wager that it’s because of number of invocations of the comparator (D – and C – use tristate, while C++ needs to call op< twice to check for equality)

    • atilaneves says:

      At 10M elements, the number of cache misses probably makes that irrelevant. I did what was idiomatic for each one, I didn’t really think of the swapping cost, that’s a good point. I tried to avoid allocation and it caused other problems. Let me run this again in a better way.

      • atilaneves says:

        I updated the code and the numbers. C is much closer now, but still the slowest. Thanks for the comment!

      • Denis Koroskin says:

        Haven’t seen the old version, but it’s still comparing performance of c++ string assignment and operator == vs strcmp and NOT sorting.

        I changed C++ to (also) use const char* for the string and it now beats C by a factor of 2 (2.20s vs 1.13s for 10_000_000 elements).

        Same with D: your version takes 1 sec 470 ms vs char*/strcmp taking 451 ms (latest ldc2 alpha)

        All 3 compiled with -03

      • atilaneves says:

        Interesting!

    • Regarding D’s tristate check, sort uses a “less” comparison, which is translated by the compiler to `a.opCmp(b) < 0`, so the equality state is essentially thrown away anyway. (http://dlang.org/phobos/std_algorithm_sorting.html#.sort)

      Besides, I think for sorting equality check is only important if you use stable sort.

  2. I like how you clean up foos in the C version before exiting the program. It must be burned in your brain to do memory cleanup 🙂

  3. jfa says:

    IMHO that’s not C is magically fast, it’s more C++ in a “real” project quikly become bloated (can’t say for D). And another metric that could be interresting is to compare the memory needs.
    That said, i’m surprised by the result, but the speed is not the main reason why we’re using C.

  4. > C++ sort is faster than C’s qsort because of templates and inlining.

    How does C shake out when you don’t kneecap it by preventing optimization across dynamically linked library boundary?

  5. zarazek says:

    C and C++ programs don’t do the same thing. When swapping two Foos in C program, just contents of Foos are swapped, and strings are left in place. In C++ version, not only Foos are swapped, but also strings inside Foos, which may involve copying, alocating and dealocating memory. Or might be just optimized out, depending on your C++ standard library and compile options (you are using C++ 11, aren’t you?). Even in optimistic case sizeof(std::string) > sizeof(char*), so C++ version is moving more data than C one. Another thing is that C++ version does proper deallocation before exit, which includes calling destructors on strings inside all Foos and freeing this memory,, and C version doesn’t do that.

    • atilaneves says:

      C++14 actually. Why would there be any copying? Even if there were, the point remains: C isn’t magically fast.

      • johnnu says:

        Well I went ahead and ran the programs myself, the C++ program ran in 2.885 seconds, while the D program ran in 4.149 seconds for 10M objects. Anyways I don’t think anyone believes C is magically faster, it’s not really something you have to prove. The sort of people that think they need to use C while in a managed language probably don’t comprehend the reasoning behind why one would want to use C. It’s the sort of people that hear a rule from somewhere (like a blog post much like this) or from someone who is actually knowledgeable and then spreads it around and follow it without actually knowing why.

        Anyways small tests like this to try and prove performance are meaningless as is evident with how wildly my results differ from yours. You were trying to show D is faster than C/C++, and probably spent much more time fine tuning D for this experiment than any of the other languages. Maybe you even contributed to that patch to Phobo’s sort function that happened not too long ago, who knows.

      • atilaneves says:

        I’ve met and worked with several people who believe that C is magically fast. I’ve also read many blog posts and comments showing how many people still believe this myth. It exists.

        As for trying to show that D was faster than C/C++, that wasn’t what I tried to do, it just came out that way when I did it. What I wanted to show is that C would be slower than C++ _and_ D, thereby trying to disprove the myth.

        Not that it matters – people aren’t convinced by data when they have strong beliefs. It’s called the backfire effect.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: