Seriously, just use D to call C from Python

Last week I suggested that if you want to call C code from Python that you should use D. I still think that 4 lines of code (two of which are header includes) and a build system is hard to beat if that indeed is your goal. The internet, of course, had different ideas.

I think most of the comments I got can be summarised as:

In all cases, the general sentiment was that these were easy options to get the task done. Either they didn’t read the blog post, or they’re nowhere near as lazy as I am.

I described how writing basically no code gave you access to nanomsg from Python, including macros. No glue code, no struct definitions, nothing. “I can haz C headers in Python?” and some boring build system things were the long and short of it. 4 lines of code that scale with O(1) is hard to beat.

Let’s look at the suggestions then, starting with ctypes. It can call symbols in loaded shared libraries, which is great if you’re calling int foo(int, int) but not so great for real code (like, you know, nanomsg). No enums, no macros, and you have to declare C structs yourself. I shudder to think of what I’d have to write to get anything done. Next.

Some acknowledged that ctypes wasn’t all that, and that cffi should be used instead. I just tried using it today, thinking that it’d be amazing if I could just give Python some C headers and magic happened. But I can’t. I can pass in C declarations as a Python string, which I can manually copy from a header. But not just tell it what header I want (or worse, two in the case of my nanomsg example). I tried reading the headers myself, concatenating the strings and giving that to cdef, but that doesn’t work,  because:

  • Real headers have pesky things like #include directives in themcausing ParseErrors to be raised. Oops. So much for valid C syntax.
  • Struct definitions usually have members that are structs defined in another header, which have members that are structs defined in another header, which…
  • Macros

I soon realised how much work calling nanomsg with cffi would be and gave up. I wanted to show how much more code it takes, but as previously mentioned I’m lazy so no. Just… no. I figured someone would have written a generator for cffi, and I was right. It requires installing something called Irkit. Turns out it’s a parser generator, so cffi-gen  doesn’t use a industrial grade C compiler. Good luck compiling C code in the wild with that.

I’ve used Cython before and it’s even more work than cffi. Cython is the reason that I wrote autowrap in the first place.

Both boost::python and pybind11 are to C++ what pyd is to D. And pyd is another reason I wrote autowrap, because it has the gall to require users to specify one by one all of the things (functions, classes, etc.) they want to make available. With D’s compile-time reflection, I find that requirement quite insulting.

Last week I presented a way of calling C from Python which I find incredibly easy.  I got told there were viable alternatives, and since I didn’t know any better at the time I stroked my beard and thought “iiiinteresting”.

This week I can confidently state my belief that the easiest way to call C code from Python is D. Disagree? Show me how you can call nanomsg from Python your way. Github or it didn’t happen.

Tagged , , , , , , ,

Want to call C from Python? Use D!

In my last blog post I wrote about the power of D’s compile-time reflection and string mixins, showing how they could be used to call D from Python so easily it might as well be magic. As amazing as that may be for those of us who have D codebases we want to expose to Python users, this doesn’t help the vastly more numerous programmers who want to call pre-existing C code instead. If C had D’s metaprogramming abilities, imagine seamlessly calling into nanomsg with as much as ease as I showed in my previous blog post. Well… about that.

D can easily interoperate with C, with the only requirement being that the function and data structure declarations be translated into D syntax. But once the translation is done, those declarations are now D code that can be reflected on, fed to autowrap, and automagically wrapped for Python consumption. That would be a pretty powerful combo if not for the boring work of translating all needed declarations, macros included. It’s still a lot easier than talking to the Python C API itself of course, but maybe not quite killer feature material.

However, I wrote a little project called dpp because I’m lazy and don’t want to hand-translate C to D. Envious of C++’s and Objective C’s credible claim to be the only languages that can seamlessly interoperate with C (due to header inclusion and compatible syntax), I tried to replicate the experience in the D world. Using dpp, one can #include C headers in what would otherwise be D code and use it as one would in C++, even going to the point of supporting preprocessor macros. I wrote about the project in a different blog post.

Given this .dpp file:

// nanomsg.dpp
#include "nanomsg/nn.h"
#include "nanomsg/pipeline.h"

And this .d file:

import autowrap;
mixin(
    wrapDlang!(
        LibraryName("nanomsg"), // name of the .so
        Modules(Yes.alwaysExport, "nanomsg") // name of the D module
    )
);

When we build both of those files above into nanomsg.so, we get to write this Python code that actually sends packets:

from nanomsg import (nn_socket, nn_close, nn_bind, nn_connect,
                     nn_send, nn_recv, AF_SP, NN_PUSH, NN_PULL)
import time

uri = "inproc://test"

pull = nn_socket(AF_SP, NN_PULL)
nn_bind(pull, uri)
time.sleep(0.05)  # give it time to set up (awful I know, but meh)

push = nn_socket(AF_SP, NN_PUSH)
nn_connect(push, uri)
msg = b'abc'
nn_send(push, msg, len(msg), 0)

Python, welcome to C, via D, and without even having to write any code to do it. Did I mention that AF_SP, NN_PUSH, and NN_PULL are all C macros? And yet, look at Python importing and using them like a boss.

Want to try it yourself? It’s on github.

If you want to call C from Python, use D.

Tagged , , , , , ,

The power of reflection

When I was at CppCon 2016 I overheard someone ask “Everyone keeps talking about  reflection, but why do we actually need it?”. A few years before that, I also would have had difficulty understanding why it would be useful. After years of writing D, it’s hard to imagine life without it.

Serialisation is an obvious use-case and almost always the first one anyone comes up with if pressed for an example. But there’s a lot more, like Design by Instrospection. It allows one to write a mocking framework. You start seeing applications everywhere. My favourite way to use it is to make the compiler write code for me.

Let’s say one wants to write a Python extension in native code. Each top-level Python function must have a corresponding C function (well, C ABI at least) that looks something like this:

PyObject* myfunc(PyObject* self, PyObject* args, PyObject* kwargs) {
    // ...
    return result;
}

There are a lot of details to take care of. There’s error handling, managing ref counts, and in all likelihood conversion from and to Python types since in most cases one is usually interested in calling existing pre-written code and make it available to Python. It’s tedious, and I haven’t even shown all the boilerplate to initialise the Python module and register the functions. The code for two simple functions ends up looking like this. Just thinking of clicking that link makes me sigh. Imagine what making calls into a real codebase would look like. We can do better:

import autowrap;
mixin(
    wrapDlang!(
        LibraryName("mylib"),
        Modules(
            Module("mymodule"),
            Module("myothermodule"),
        )
    )
);

The code above, when compiled, will generate a Python extension (shared library) that exposes every D function marked as “export” in the modules “mymodule” and “myothermodule” as Python functions. It’ll even convert their names from camelCase to snake_case. Any D exceptions thrown will become Python exceptions. D structs and classes become Python classses. If the original D functions take a D string, you’ll be able to pass Python strings to them in user code. Modulo bugs, this… works! The code shown above is the only code that needs to be written. Setting up the build system takes more work!

“Only” two D features are used here: the ability to do reflection at compile-time (and therefore to know which functions are in those modules and what types they take and return), and being able to mix in strings at compile-time. All the boilerplate is written for the user and inserted inline as if written by hand, but it’s the compiler that’s doing the heavy lifting.

Imagine now that your boss, pleased with these results, now wants you to also make the same D code avaiable to Excel users. The code changes not one bit, those lines above also work for Excel (the trick is telling the build system to depend on the autowrap:excel dub package instead of autowrap:python). Instead of snake_case functions, one now gets PascalCase as per Excel convention.

Same API, same functionality, different implementation. And no code to write for the user. The curious can see how it’s done on github.

 

Tagged , , , ,

Type inference debate: a C++ culture phenomenon?

I read two C++ subreddit threads today on using the auto keyword. They’re both questions: the first one asks why certain people seem to dislike using type inference, while the second asks about what commonly taught guidelines should be considered bad practice. A few replies there mention auto.

This confuses me for more than one reason. The first is that I’m a big proponent of type inference. Having to work to tell a computer what it already knows is one of my pet peeves. I also believe that wanting to know the exact type of a variable is a, for lack of a better term, “development smell”, especially in typed languages with generics. I think that the possible operations on a type are what matters, and figuring out the exact type if needed is a tooling problem that is solved almost everywhere. If you don’t at least have “jump to definition” in your development environment, you should.

The other main source of confusion for me is why this debate seems restricted to the C++ community, at least according to my anectodal evidence. As far as I’m aware of, type inference is taken to be a good thing almost universally in Go, D, Rust, Haskell, et al. I don’t have statistics on this at all, but from what I’ve seen so far it seems to be the case. Why is that?

One reason might be change. As I learned reading Peopleware, humans aren’t too fond of it. C++ lacked type inference until C++11, and whoever learned it then might not like using auto now; it’s “weird”, so it must be bad. I’ve heard C programmers who “like” looping over a collection using an integer index and didn’t understand when I commented on their Python code asking them to please not do that. “Like” here means “this is what I’m used to”. Berating people for disliking change is definitely not the way forward, however. One must take their feelings into account if one wants to effect change. Another lesson learned from Peopleware: the problem is hardly ever technical.

Otherwise, I’m not sure. I’m constantly surprised by what other people find to be more or less readable when it comes to code. There are vocal opponents to operator overloading, for instance. I have no idea why.

What do you think? Where do you stand on auto?

Tagged , , , , , , , , ,

Boden, Flutter, React Native?

A while back I had an idea for a mobile app that I still might write. The main problems were finding the time and analysis paralysis: what do I write the project in?

I’ve done some Android development before using Qt, and it was a hassle. I’ve been told that JNI was made painful on purpose so that nobody would try to do it and I can see why. At least at the time the Android NDK had modern compilers compared to what my team was allowed to use for the rest of the codebase and it was my first taste of C++11 in production.

As much as I dislike the Apple ecosystem, it would be foolish to say the least to ignore iOS. Writing Java for the Android UI I can deal with (but wouldn’t be happy about), learning Objective C or Swift? That I’m not sure of. In any case, I don’t want to write UI code twice.

Since then, I’ve been meaning to look at Flutter, which seems to avoid most of the problems I would have to face otherwise. But that means learning Dart, which would be a significant sidetrack. Yesterday however, I learned on cppcast that there’s a C++ alternative called boden. On the one hand, I don’t have to learn a new language. On the other hand I have to write C++, with all the swordfights and seg faults that entails. Which leads me all the way back to analysis paralysis.

There’s also React Native and Xamarin, and probably a few other alternatives, but they all for reasons I can’t quite explain give me a bit of an allergic reaction.

I guess I’m going to have to bite the bullet and try one of these frameworks. Or all of them. Now to think of a tiny but not useless test project…

Build systems are stupid

Big complicated projects nearly always need a build system. They might be generating code. They might be compiling different branches depending on the platform or a build-time configuration option. They might turn on/off unity builds for C++.  For most simple projects however, build systems are stupid.

Let’s say you have a single C file. We need to tell the compiler the name of the file we want to compile, and the name of the resulting binary unless we’re happy with it being called a.out, which is unlikely:

gcc -Wall -Wextra -Werror -o foo foo.c

Maybe it’s the first commit in brand new repository and all you have is foo.c in there. Why am I telling the compiler what to build? What else would it build??

$ ls
foo.c
$ gcc
gcc: fatal error: no input files
compilation terminated.

“Real projects don’t look like that!”. Ok:

$ ls
build include src 
$ ls src
foo.c main.c
$ ls include
foo.h

I doubt any humans who have ever written C before would be confused as to how this project is supposed to be compiled. And yet, this is probably the simplest way I can come up with to tell the computer to do it properly:

CFLAGS ?= -Iinclude -Wall -Wextra -Werror -g

SRCS := $(shell find src -name *.c)
HDRS := $(shell find include -name *.h)
OBJS := $(subst src,build,$(subst .c,.o,$(SRCS)))

build/foo: $(OBJS)
	gcc -o $@ $^

build/%.o: src/%.c $(HDRS)
	gcc $(CFLAGS) -o $@ -c $<

Is there a better way than $(subst)? Probably. I don’t care. I was also too lazy to figure out dependencies so all source files depend on all headers. That’s a lot of lines of Makefile and all I wanted to specify were that source files are in src, headers are in include, binaries go inbuild, and that the compiler flags are -Wall -Wextra -Werror  -g. The first three are obvious from the existence and names of those directories, so there are 8 lines of code here to tell a computer what the compiler flags are. Robots: 1 Humans: 0.

CMake isn’t that much better:

cmake_minimum_required (VERSION 2.8.11)
project (foo)

file(GLOB SRC_FILES ${PROJECT_SOURCE_DIR}/src/*.c)
add_executable(foo ${SRC_FILES})
target_include_directories(foo PRIVATE ${PROJECT_SOURCE_DIR}/include)

Robots: 2 Humans: 0.

I’ll probably get told off for using file(GLOB) because what I really should be doing is manually listing source files because reasons. Why would I have files in src if I don’t want them to be compiled? In what universe does it make sense for a person whose job it is to tell computers what to do so that they don’t have to do it themselves to tell a build system which files to build? In my opinion every new language should default to doing what Pony does. Running the compiler just builds everything in the current directory tree.

I also think that new languages should ship with a compiler-based build system, i.e. the compiler is the build system. The reason for that is that as a developer, I want to rebuild the minimum possible amount of code that is required after my edits to run my tests. The CMake solution above fails at this: the compiler calculates dependencies and automatically feeds it into the build system, but those dependencies are too coarse.

If I touch any header, all source files that #include it will be recompiled. This happens even if it’s an additive change, which by definition can’t affect any existing source files.  If I change a function’s signature that is only used in foo.c, well tough: I’ll have to wait until bar.c, baz.c and quux.c all get recompiled for the temerity of depending on different functions from the same header.

This problem is bad enough in C and C++ but actually gets worse for languages with modules: now every file is a “header”. Changes to the implementation trigger recompilations, because our build systems are too stupid to know any better. I can’t see how a good build system can be an external tool unless the compiler is a library, and even then it would be doing the same work twice since dependency calculations are better done while a module/translation unit is being compiled.

Taking it a step further, the whole concept of caring about files and modules is probably outdated anyway. Why not a Smalltalk-like environment? You make changes to the source and your “image” gets rebuilt in the background, one part of the AST at a time, automagically resolving and rebuilding dependencies (at the function level!).

Computers and build systems: what have you done for me lately?

Tagged , , , , ,

My DConf2019 talk

I’ll be speaking at DConf this year. I haven’t yet decided how to structure the talk yet, but I’d like to take a page out of Dan Sak’s excellent CppCon 2016 talk that covered technical and psychological aspects of language adoption. In his case he was addressing a C++ audience on the difficulty of convincing C programmers to adopt C++ (an admirable goal!). In mine, it’s how do I out-C++ C++?

The genesis for my talk was my decision a few years ago to write tests for a legacy C codebase. I wanted to use D, but even as a fan of the language I chose to write the tests in C++ instead. And if even I wouldn’t have picked D for that task, who would?

I still think it was the right decision to make. The first step in”importing” the legacy project into a C++ test looks like this:

extern "C" {
    #include "legacy.h"
}

There is no second step. From here one #includes a test library such as catch2 and starts writing tests for the C functions in the API.

As far as I know, every language under the sun can interface with C. There’s usually (always?) some way to do FFI, and examples invariably show how to call a function that takes 2 integers, maybe even a const char* parameter. Unfortunately, that’s not what real codebases need to do. They need to call a function that takes a pointer to a structure that’s defined in a different header, which has 3 pointers to structures that are themselves defined in different headers, which…

You get the idea. Writing the definitions by hand error-prone, boring, and time-consuming. Even if correct, nobody seems to talk about the elephant in the C API room: macros. Not once have I encountered a non-trivial C API that doesn’t require the C preprocessor to use properly. C++ users are at an advantage here: just use the macros. Everyone else has to get by with ad-hoc solutions. And if the function-like macro implementations change, the code will break.

I think that the only way to make it so that C++ isn’t the obvious and/or only choice is that a challenger allows one to write `#include “legacy.h”` and have it work just as easily as it does natively. That’s why I started the dpp project, and I’m looking forward to talking about the technical challenges I’ve encountered on my journey so far.

See you at DConf!

Tagged , , , , , , ,

When you can’t (and shouldn’t) unit test

I’m a unit test aficionado, and, as such, have attempted to unit test what really shouldn’t be. It’s common to get excited by a new hammer and then seeing nails everywhere, and unit testing can get out of hand (cough! mocks! cough!).

I still believe that the best tests are free from side-effects, deterministic and fast. What’s important to me isn’t whether or not this fits someone’s definition of what a unit test is, but that these attributes enable the absence of slow and/or flaky tests. There is however another class of tests that are the bane of my existence: brittle tests. These are the ones that break when you change the production code despite your app/library still working as intended. Sometimes, insisting on unit tests means they break for no reason.

Let’s say we’re writing a new build system. Let’s also say that said build system works like CMake does and spits out build files for other build systems such as ninja or make. Our unit test fan comes along and writes a test like this:

assert make_output == "all: foo\nfoo: foo.c\n\tgcc -o foo foo.c"

I believe this to be a bad test, and the reason why is that it’s checking the implementation instead of the behaviour of the production code. Consider what happens when the implementation is changed without affecting behaviour:

all: foo\nfoo: foo.c\n\tgcc -o $@ $<

The behaviour is the same as before: any time `foo.c` is changed, `foo` will get recompiled. The implementation not only isn’t the same, it’s arguably better now, and yet the assertion in the test would fail. I think we can all agree that the ROI for this test is negative if this is all it takes to break it.

The orthodox unit test approach to situations like these is to mock the service in question, except most people don’t get the memo that you should only mock code you own. We don’t control GNU make, so we shouldn’t be doing that. It’s impossible to copy make exactly in a mock/stub/etc. and it’s foolish to even try. We (mostly) don’t care about the string that our code outputs, we care that make interprets that string with the correct semantics.

My conclusion is that I shouldn’t even try to write unit tests for code like this. Integration tests exist for a reason.

Tagged , , , , ,

Issues DIP1000 can’t (yet) catch

D’s DIP1000 is its attempt to avoid memory safety issues in the language. It introduces the notion of scoped pointers and in principle (modulo bugs) prevents one from writing memory unsafe code.

I’ve been using it wherever I can since it was implemented, and it’s what makes it possible for me to have written a library allowing for fearless concurrency. I’ve also written a D version of C++’s std::vector that leverages it, which I thought was safe but turns out to have had a gaping hole.

I did wonder if Rust’s more complicated system would have advantages over DIP1000, and now it seems that the answer to that is yes. As the the linked github issue above shows,  there are ways of creating dangling pointers in the same stack frame:

void main() @safe {
    import automem.vector;

    auto vec1 = vector(1, 2, 3);
    int[] slice1 = vec1[];
    vec1.reserve(4096);  // invalidates slice1 here
}

Fortunately this is caught by ASAN if slice1 is used later. I’ve worked around the issue for now by returning a custom range object from the vector that has a pointer back to the container – that way it’s impossible to “invalidate iterators” in C++-speak. It probably costs more in performance though due to chasing an extra pointer. I haven’t measured to confirm the practical implications.

In Rust, the problem just can’t occur. This fails to compile (as it should):

use vec;

fn main() {
    let mut v = vec![1, 2, 3, 4, 5];
    let s = v.as_slice();
    println!("org v: {:?}\norg s: {:?}", v, s);
    v.resize(15, 42);
    println!("res: v: {:?}\norg s: {:?}", v, s);
}

I wonder if D can learn/borrow/steal an idea or two about only having one mutable borrow at a time. Food for thought.

Tagged , , , ,

The joys of translating C++’s std::function to D

I wrote a program to translate C headers to D. Translating C was actually more challenging than I thought; I even got to learn things I didn’t know about the language even though I’ve known it for 24 years. The problems that I encountered were all minor though, and to the extent of my knowledge have all been resolved (modulo bugs).

C++ is a much larger language, so the effort should be considerably more. I didn’t expect it to be as hard as it’s been however, and in this blog I want to talk about how “interesting” it was to translate C++11’s std::function by hand.

The first issue for most languages would be that it relies on template specialisation:

template<typename>
class function;  // doesn't have a definition anywhere

template<typename R, typename... Args>
class function<R(Args...)> { /* ... */ }

This is a way of constraining the std::function template to only accept function types. Perhaps surprisingly to some, the C++ syntax for the type of a function that takes two ints and returns a double is double(int, int). I doubt most people see this outside of C++ standard library templates. If it’s still confusing, think of double(int, int) as the type that is obtained by deferencing a pointer of type double(*)(int, int).

D is, as far as I know, the only other language other than C++ to support partial template specialisation. There are however two immediate problems:

  • function is a keyword in D
  • There is no D syntax for a function type

I can mitigate the name issue by calling the symbol function_ instead; however, this will affect name mangling, meaning nothing will actually link. D does have pragma(mangle) to tell the compiler how to mangle symbols, but std::function is a template; it doesn’t have any mangling until it’s instantiated. Let’s worry about that later and call the template function_ for now.

The second issue can be worked around:

// C++: `using funPtr = double(*)(int, int);`
alias funPtr = double function(int, int);
// C++: `using funType = double(int, int);`
alias funType = typeof(*funPtr.init);

As in C++, the function type is the type one gets from deferencing a function pointer. Unlike C++, currently there’s no syntax to write it directly. First attempt:

// helper to automate getting an alias to a function type
template FunctionType(R, Args...) {
    alias ptr = R function(Args);
    alias FunctionType = typeof(*ptr.init);
}

struct function_(T);
struct function_(T: FunctionType!(R, Args), R, Args...) { }

This doesn’t work, probably due to this bug preventing the helper template FunctionType from working as intended. Let’s forget the template constraint:

extern(C++, "std") {
    struct function_(T) {
        import std.traits: ReturnType, Parameters;
        alias R = ReturnType!T;
        alias Args = Parameters!T;
        // In C++: `R operator()(Args) const`;
        R opCall(Args) const;
    }
}

void main() {
    alias funPtr = double function(double);
    alias funType = typeof(*funPtr.init);
    function_!funType f;
    double result = f(3.3);
}

This compiles but it doesn’t link: there’s an undefined reference to std::function_::operator()(double) const. Looking at the symbols in the object files using nm, we see that g++ emitted _ZNKSt8functionIFddEEclEd but dmd is trying to link to _ZNKSt9function_IFddEEclEd. As expected, name mangling issues related to renaming the symbol.

We could manually add a pragma(mangle) to tell D how to mangle the operator for the double(double) template instantiation, but that solution doesn’t scale. CTFE (constexpr if you speak C++ but not D) to the rescue!

// snip - as before
pragma(mangle, opCall.mangleof.fixMangling)
R opCall(Args) const;

// (elsewhere at file scope)
string fixMangling(string str) {
    import std.array: replace;
    return str.replace("9function_", "8function");
}

What’s going on here is an abuse of D’s compile-time power. The .mangleof property is a compile-time string that tells us how a symbol is going to be mangled. We pass this string to the fixMangling function which is evaluated at compile-time and fed back to the compiler telling it what symbol name to actually use. Notice that function_ is still a template, meaning .mangleof has a different value for each instantiation. It’s… almost magical. Hacky, but magical.

The final code compiles and links. Actually creating a valid std::function<double(double)> from D code is left as an exercise to the reader.

 

Tagged , , , , , , ,