Category Archives: Programming

#include C headers in D code

I’ll lead with a file:

// stdlib.dpp
#include <stdio.h>
#include <stdlib.h>

void main() {
    printf("Hello world\n".ptr);

    enum numInts = 4;
    auto ints = cast(int*) malloc(int.sizeof * numInts);
    scope(exit) free(ints);

    foreach(int i; 0 .. numInts) {
        ints[i] = i;
        printf("ints[%d]: %d ".ptr, i, ints[i]);
    }

    printf("\n".ptr);
}

The keen eye will notice that, except for the two include directives, the file is just plain D code. Let’s build and run it:

% d++ stdlib.dpp
% ./stdlib
Hello world
ints[0]: 0 ints[1]: 1 ints[2]: 2 ints[3]: 3

Wait, what just happened?

You just saw a D file directly #include two headers from the C standard library and call two functions from them, which was then compiled and run. And it worked!

Why? I mean, just… why?

I’ve argued before that #include is C++’s killer feature. Interfacing to existing C or C++ libraries is, for me, C++’s only remaining use case. You include the relevant headers, and off you go. No bindings, no nonsense, it just works. As a D fan, I envied that. So this is my attempt to eliminate that “last” (again, for me, reasonable people may disagree) use case where one would reach for C++ as the weapon of choice.

There’s a reason C++ became popular. Upgrading to it from C was a decision with essentially 0 risk.  I wanted that “just works” feature for D.

How?

d++ is a compiler wrapper. By default it uses dmd to compile the D code, but that’s configurable through the –compiler option. But dmd can’t compile code with #include directives in it (the lexer won’t even like it!), so what gives?

d++ will go through a .dpp file, and upon encountering an #include directive it expands it in-place, similarly to what would happen with a C or C++ compiler. Differently from clang or gcc however, the header file can’t just be inserted in, since the syntax of the declarations is in a different language. So it uses libclang to parse the header, and translates all of the declarations on the fly. This is trickier than it sounds since C and C++ allow for things that aren’t valid in D.

There’s one piece of the usability puzzle that’s missing from that story: the preprocessor. C header files have declarations but also macros, and some of those are necessary to use the library as it was intended. One can try and emulate this with CTFE functions in D, and sometimes it works. But I don’t want “sometimes”, I want guarantees, and the only way to do that is… to use the C preprocessor.

Blasphemy, I know. But since worse is better, d++ redefines all macros in the included header file so they’re available for use by the D program. It then runs the C preprocessor on the result of expanding all the #include directives, and the final result is a regular D file that can be compiled by dmd.

What next?

Bug fixing and C++ support. I won’t be happy until this works:

#include <vector>
void main() {
    auto v = std.vector!int();
    v.push_back(42);
}

Code or it didn’t happen

I almost forgot: https://github.com/atilaneves/dpp.

 

Advertisements
Tagged , ,

Keep D unittests separated from production code

D has built-in unit tests, and unittest is even a keyword. This has been fantastically successful for the language, since there is no need to use an external framework to write tests, it comes with the compiler. Just as importantly, a unittest after a function can be used as documentation, with the test(s) showing up as “examples”. This is the opposite approach of running code in documentation as tests in Python – generate documentation from the tests instead.

As such, in D (similarly to Rust), it’s usual, idiomatic even, to have the tests written next to the code they’re testing. It’s easy to know where to see examples of the code in action: scroll down a bit and there are the unit tests.

I’m going to argue that this is an anti-pattern.

Let me start by saying that some tests should go along with the production code. Exactly the kind of “examply” tests that only exercise the happy path. Have them be executable documentation, but only have one of those per function and keep them short. The others? Hide them away as you would in C++. Here’s why I think that’s the case:

They increase build times.

If you edit a test, and that test lives next to production code, then every module that imports that module has to be rebuilt, because there’s currently no good way to figure out whether or not any of the API/ABI of that module has changed. Essentially, every D module is like a C++ header, and you go and recompile the world. D compiles a lot faster than C++, but when you’re doing TDD (in my case, pretty much always), every millisecond in build times count.

If the tests are in their own files, then editing a test means that usually only one file needs to be recompiled. Since test code is code, recompiling production code and its tests takes longer than just compiling production code alone.

I’m currently toying with the idea of trying to compile per package for production code but per module for test code – the test code shouldn’t have any dependencies other than the production code itself. I’ll have to time it to make sure it’s actually faster.

version(unittest) will cause you problems if you write libraries.

Let’s say that you’re writing a library. Let’s also say that to test that library you want to have a dependency on a testing library from http://code.dlang.org/, like unit-threaded. So you add this to your dub.sdl:

configuration "default" {
}
configuration "unittest" {
     dependency "unit-threaded" version="~>0.7.0"
}

Normal build? No dependency. Test build? Link to unit-threaded, but your clients never have the extra dependency. Great, right? So you want to use unit-threaded in your tests, which means an import:

module production_code;
version(unittest) import unit_threaded;

Now someone goes and adds your library as a dependency in their dub.sdl, but they’re not using unit-threaded because they don’t want to. And now they get a compiler error because when they compile their code with -unittest, the compiler will try and import a module/package that doesn’t exist.

So instead, the library has to do this in their dub.sdl;

configuration "unittest" {
    # ...
    versions "TestingMyLibrary"
}

And then:

version(TestingMyLibrary) import unit_threaded;

It might even be worse – your library might have code that should exist for version(unittest) but not version(TestingMyLibrary) – it’s happend to me. Even in the standard library, this happened.

Keep calm and keep your tests separated.

You’ll be happier that way. I am.

Tagged ,

DSLs: even more important for tests

Last week I wrote about the benefits of Domain Specific Languages (DSLs). Since then I’ve been thinking and realised that DSLs are even more important when writing tests. It just so happened that I was writing tests in Emacs Lisp for a package I wrote called cmake-ide, and given that Lisp has macros I was trying to leverage them for expressiveness.

Like most other programmers, I’ve been known from time to time to want to raze a codebase to the ground and rewrite it from scratch. The reason I don’t, of course, was aptly put by Joel Spolsky years ago. How could I ensure that nobody’s code would break? How can I know the functionality is the same?

The answer to that is usually “tests”, but if you rewrite from scratch, your old unit tests probably won’t even compile. I asked myself why not, why is it that the tests I wrote weren’t reusable. It dawned on me that the tests are coupled to the production code, which is never a good idea. Brittle tests are often worse than no tests at all (no, really). So how to make them malleable?

What one does is to take a page from Cucumber and write all tests using a DSL, avoiding at all costs specifying how anything is getting done and focussing on what. In Lisp-y syntax, avoid:

(write-to-file "foo.txt" "foobarbaz")
(open-file "foo.txt")
(run-program "theapp" "foo.txt" "out.txt")
(setq result parse-output "out.txt")
;; assertion here on result

Instead:

(with-run-on-file "theapp" "foo.txt" "foobarbaz" "out.txt" result
     ;; assertion here on result

 

Less code, easier to read, probably more reusable. There are certainly better examples; I suggest consulting Cucumber best practices on how to write tests.

Not every language will offer the same DSL liberties and so your mileage may vary. Fortunately for me, the two languages I’d been writing tests in were Emacs Lisp and D, and in both of those I can go wild.

Tagged

In defence of DSLs

I’ve often heard it said that DSLs make a codebase harder to understand, because programmers familiar with the language the codebase is written in now have to learn the DSL as well. Here’s my problem with that argument: every codebase is written in an embedded DSL. More often than not that DSL is ad-hoc, informally specified and bug-ridden, but a DSL all the same.

The syntax may be familiar to anyone who knows the general purpose language it’s written in, but the semantics are just as hard to grasp as any other DSL out there. Usually, harder to grasp, since there’s so much more code to read to understand what it is exactly that’s going on.

I can write C++. Does that mean I can download the source code for Firefox and jump straight in to fixing a bug? Of course not.

I really think that Lisp got it right, and that the next time I write any Emacs Lisp I really ought to think of what language I can express the problem domain better, then implement that language. It’s something that feels right but that somehow I’ve never actually really done.

It’s true that designing a DSL means designing a language, and that not all programmers are good language designers. But what’s the alternative? Use no abstractions? Let them write a giant mess for others to attempt to navigate?

In the end, isn’t the art of language design a way to state solutions to problems, simply? To capture the essence of what’s trying to be said/programmed elegantly?

I don’t know about you, but to me that just sounds like programming.

Tagged

On ESR’s thoughts on C and C++

ESR wrote two blog posts about moving on from C recently. As someone who has been advocating for never writing new code in C again unless absolutely necessary, I have my own thoughts on this. I have issues with several things that were stated in the follow-up post.

C++ as the language to replace C. Which ain’t gonna happen” – except it has. C++ hasn’t completely replaced C, but no language ever will. There’s just too much of it out there. People will be maintaining C code 50 years from now no matter how many better alternatives exist. If even gcc switched to C++…

It’s true that you’re (usually) not supposed to use raw pointers in C++, and also true that you can’t stop another developer in the same project from doing so. I’m not entirely sure how C is better in that regard, given that _all_ developers will be using raw pointers, with everything that entails. And shouldn’t code review prevent the raw pointers from crashing the party?

if you can mentally model the hardware it’s running on, you can easily see all the way down” – this used to be true, but no longer is. On a typical server/laptop/desktop (i.e. x86-64), the CPU that executes the instructions is far too complicated to model, and doesn’t even execute the actual assembly in your binary (xor rax, rax doesn’t xor anything, it just tells the CPU a register is free). C doesn’t have the concept of cache lines, which is essential for high performance computing and on any non-trivial CPU.

One way we can tell that C++ is not sufficient is to imagine an alternate world in which it is. In that world, older C projects would routinely up-migrate to C++“. Like gcc?

Major OS kernels would be written in C++“. I don’t know about “major”, but there’s  BeOS/Haiku and IncludeOS.

Not only has C++ failed to present enough of a value proposition to keep language designers uninterested in imagining languages like D, Go, and Rust, it has failed to displace its own ancestor.” – I think the problem with this argument is the (for me) implicit assumption that if a language is good enough, “better enough” than C, then logically programmers will switch. Unfortunately, that’s not how humans behave, as as much as some of us would like to pretend otherwise, programmers are still human.

My opinion is that C++ is strictly better than C. I’ve met and worked with many bright people who disagree. There’s nothing that C++ can do to bring them in – they just don’t value the trade-offs that C++ makes/made. Some of them might be tempted by Rust, but my anedoctal experience is that those that tend to favour C over C++ end up liking Go a lot more. I can’t stand Go myself, but the things about Go that I don’t like don’t bother its many fans.

My opinion is also that D is strictly better than C++, and I never expect the former to replace the latter. I’m even more fuzzy on that one than the reason why anybody chooses to write C in a 2017 greenfield project.

My advice to everyone is to use whatever tool you can be most productive in. Our brains are all different, we all value completely different trade-offs, so use the tool that agrees with you. Just don’t expect the rest of the world to agree with you.

 

Tagged , , ,

Operator overloading is a good thing (TM)

Brains are weird things. I used to be a private maths tutor, and I always found it amazing how a little change in notation could sometimes manage to completely confuse a student. Notation itself seems to me to be a major impediment for the majority of people to like or be good at maths. I had fun sometimes replacing the x in an equation with a drawing of an apple to try and get the point across that the actual name (or shape!) of a variable didn’t matter, that it was just standing in for something else.

Programmers are more often than not mathematically inclined, and yet a similar phenomenon seems to occur with the “shape” of certain functions, i.e. operators. For reasons that make us much sense to me as x confusing maths students, the fact that a function has a name that has non-alphanumeric characters in them make them particularly weird. So weird that programmers shouldn’t be allowed to defined functions with those names, only the language designers. That’s always a problem for me – languages that don’t give you the same power as the designers are Blub as far as I’m concerned. But every now and again I see a blost post touting the advantages of some language or other, listing the lack of operator overloading as a bonus.

I don’t even understand the common arguments against operator overloading. One is that somehow “a + b” is now confusing, because it’s not clear what the code does. How is that different from having to read the documentation/implementation of “a.add(b)”? If it’s C++ and “a + b” shows up, anyone who doesn’t read it as “a.operator+(b)” or “operator+(a, b)” with built-in implementations of operator+ for integers and floating point numbers needs to brush up on their C++. And then there’s the fact that that particular operator is overloaded anyway, even in C – the compiler emits different instructions for floats and integers, and its behaviour even depends on the signedness of ints.

Then there’s the complaint that one could make operator+ do something stupid like subtract. Because, you know, this is totally impossible:

int add(int i, int j) {
    return i - j;}

Some would say that operator overloading is limited in applicability since only numerical objects and matrices really need them. But used with care, it might just make sense:

auto path = "foo" / "bar" / "baz";

Or in the C++ ranges by Eric Niebler:

using namespace ranges;
int sum = accumulate(view::ints(1)
                   | view::transform([](int i){return i*i;})
                   | view::take(10), 0);

I’d say both of those previous examples are not only readable, but more readable due to use of operator overloading. As I’ve learned however, readability is in the eye of the beholder.

All in all, it confuses me when I hear/read that lacking operator overloading makes a language simpler. It’s just allowing functions to have “special” names and special syntax to call them (or in Haskell, not even that). Why would the names of functions make code so hard to read for some people? I guess you’d have to ask my old maths students.

Tagged , , , ,

Why I find developing on/for Windows exasperating

I ran DOS on my first PC. The natural progession unfolded, with me then running Windows 95, Windows 98, and Windows XP after that (Windows ME, like the Matrix sequels, was a collective bad dream that didn’t really happen). I used Borland’s IDE to write C code, then RHIDE with DJGPP since I couldn’t even imagine using a compiler from the command-line. I say that because I wasn’t “brought up” using *nix at all, and my only exposure was at university. These days however, I do nearly all of my development on Linux. Why? I find it to be a much, much better experience.

Somewhat unfortunately for me, my current job requires me to do Windows development. And every time I boot into Windows or have to fix Windows-specific problems, it makes me want to cry. Why? Let me name some of the reasons why.

Speed, or the lack thereof. I haven’t done a thorough scientific analysis on this, because I don’t think it’d be worth my while to do so. It seems clear to me that NTFS is very very slow. Doing anything on it, from running CMake to compiling to linking, seems to take forever. To the point that it makes me actively wonder how anyone manages to get anything done on Windows. I can rebuild the reference D compiler on my laptop in about 1.6s after modifying one file. On Windows the same build, on the same machine, takes ~1 minute. Given that I find 1.6s infuriatingly slow, you can imagine what sorts of dark swear words I reserve for waiting for a whole minute while what would have been considered a supercomputer a few years ago decides to go get anything done.

Dependencies. Unlike *nix, there is no standard path(s) to look up libraries. Granted, even different Linux distros use different conventions and paths from each other, but libraries are usually installed with a package manager anyway so mostly you don’t care.  And if you did, your linker would find them anyway without the need for extra flags. Need to link to, say, nanomsg on Windows? Good luck with that. Ah, but there’s vcpkg, I hear you say. Apparently Visual Studio auto-magically finds the libraries that vcpkg “installs”. Job done if you’re clicking a button in an IDE, not so much if you’re using a real build system running in CI. It _could_ be just as easy as adding a flag to your linker, but, alas, the .lib files don’t all end up in the same directory. vcpkg allows me to download libraries without having to write Powershell, but then actually linking is, for lack of a better word, “fun”. On Linux? pacman -S nanomsg; ninja

Batch files and/or powershell. I personally find bash horrible to write code in, but then I do Windows work and remember there’s worse. So much worse. Sigh.

Bash. I’ll explain. Git bash is amazing, I remember a time before that existed (I tried, unsucessfully, to compile bash from source for Windows with at least 3 different implementations back in the day). So why am I complaining? First of all, because I use zsh and haven’t seen an easy way to do that yet on Windows. Secondly, because building on Windows from the command-line often requires cmd.exe. Building C++ code? I’m not going to write my own bash version of vcvarsall.bat just to do that. Commands have a habit of spitting out error messages with backslashes (cos, duh, Windows), and good luck copying and pasting that into your bash shell.

Tooling. Want to create a zip? You’ll have to download and install a 3rd party tool. Oh, but the binary doesn’t get added to the PATH, so you’ll have to write out the full path in your batch file and pray one of your machines doesn’t install it to a different location.

Things are better than they used to be on Windows. We now have the Linux subsystem, git bash, and alternatives to the horrible built-in terminal emulator. To me, it just makes things less bad, and the moment I’m back on Arch Linux it feels like coming home from a not particularly good holiday.

Tagged , , ,

Commit failing tests if your framework allows it

In TDD, one is supposed to go through the 3-step cycle of:

  1. Write a failing test
  2. Make it pass
  3. Refactor

The common-sense approach is to not commit the failing test from the first step, since that would thrown a spanner in the works when you inevitably have to bisect your commit DAG trying to figure out where a bug was introduced.

I’ve come to a realisation recently – failing tests should be commited, but only if the testing framework being used allows you to mark failures as successes. For instance, in my D testing framework unit-threaded, I’d commit this silly example:

@ShouldFail("WIP")
unittest {
    1.shouldEqual(2);
}

If you’re not familiar with D, it has built-in unit tests, and unittest is a keyword. @ShouldFail is a User Defined Attribute, part of the library indicating that the unit test it applies to is expected to fail, and allows the user to specify an optional string describing why that’s the case. It could be a bug ID as well.

The test above passes if any of the code in the unittest block throws an exception, i.e. it passes if it fails. This way we can have a single commit of the failing test that motivated the code changes that follow it, and we can’t forget to remove @ShouldFail – in fact, if the programmer implements the feature / fixes the bug correctly, they should expect to see the test suite go red. If that doesn’t happen, either the production code or the test is buggy.

I’m not aware of many frameworks that allow a programmer to do this; pytest has something similar. If yours does, commit your failing tests.

Tagged , ,

On the novelty factor of compile-time duck typing

Or structural type systems for the pendantic, but I think most people know what I mean when I say “compile-time duck typing”.

For one reason or another I’ve read quite a few blog posts about how great the Go programming language is recently. A common refrain is that Go’s interfaces are amazing because you don’t have to declare that a type has to satisfy an interface; it just does if its structure matches (hence structural typing). I’m not sold on how great this actually is – more on that later.

What I don’t understand is how this is presented as novel and never done before. I present to you a language from 1990:

template <typename T>
void fun(const T& animal) {
    cout << "It says: " << animal.say() << endl;
}

struct Dog {
    std::string say() const { return "woof"; }
};

struct Cat {
    std::string say() const { return "meow"; }
};

int main() {
    fun(Dog());
    fun(Cat());
}

Most people would recognise that as being C++. If you didn’t, well… it’s C++. I stayed away from post-C++11 on purpose (i.e. Dog{} instead of Dog()). Look ma, compile-time duck typing in the 1990s! Who’d’ve thunk it?

Is it nicer in Go? In my opinion, yes. Defining an interface and saying a function only takes objects that conform to that interface is a good thing, and a lot better than the situation in C++ (even with std::enable_if and std::void_t). But it’s easy enough to do that in D (template contraints), Haskell (typeclasses), and Rust (traits), to name the languages that do something similar that I’m more familiar with.

But in D and C++, there’s currently no way to state that your type satisfies what you need it to due to an algorithm function requiring it (such as having a member function called “say” in the silly example above) and get compiler errors telling you why it didn’t satisfy it (such as  mispelling “say” as “sey”). C++, at some point in the future, will get concepts exactly to alleviate this. In D, I wrote a library to do it. Traits and typeclasses are definitely better, but in my point of view it’s good to be able to state that a type does indeed “look like” what it needs to do to be used by certain functions. At least in D you can say static assert(isAnimal!MyType); – you just don’t know why that assertion fails when it does. I guess in C++17 one could do something similar using std::void_t. Is there an equivalent for Go? I hope a gopher enlightens me.

All in all I don’t get why this gets touted as something only Go has. It’s a similar story to “you can link statically”. I can do that in other languages as well. Even ones from the 90s.

Tagged , , ,

The main function should be shunned

The main function (in languages that have it) is…. special. It’s the entry point of the program by convention, there can only be one of them in all the object files being linked, and you can’t run a program without it. And it’s inflexible.

Its presence means that the final output has to be an executable. It’s likely however, that the executable in question might have code that others might rather reuse than rewrite, but they won’t be able to use it in their own executables. There’s already a main function in there. Before clang nobody seemed to stumble on the idea that a compiler as a library would be a great idea. And yet…

This is why I’m now advocating for always putting the main function of an executable in its own file, all by itself. And also that it do the least amount of work possible for maximum flexibility. This way, any executable project is one excluded file away in the build system from being used as a library. This is how I’d start a, say, C++ executable project from scratch today:

#include "runtime.hpp"
#include <iostream>
#include <stdexcept>

int main(int argc, const char* argv[]) {
    try {
        run(argc, argv); // "real" main
        return 0;
    } catch(const std::exception& ex) {
        std::cout << "Oops: " << ex.what() << std::endl;
        return 1;
    }
}

In fact, I think I’ll go write an Emacs snippet for that right now.

Tagged ,