Monthly Archives: October 2015

Abstract Data Types and OOP

I think most of us agree that Abstract Data Types are a good thing. We see the description, we hear about what problems they solve and we nod on in agreement. At least I do. And yet I keep forgetting to actually use them.

Recently I’ve had to or forced myself to write code in imperative and functional styles, in more than one language. As usual, some changes were needed but what was not so usual is that my refactoring was harder than it needed to be. The reason? I didn’t hide the details of how the data were represented. In one example I was parsing JSON in Emacs Lisp to enable IDE-like features in Emacs for C and C++ development. When I tried using it on a largish project at work, it was too slow to be usable. I was naively using assoc lists for JSON objects, and mostly only because it’s the default for the library I was using. A quick fix was to instead use hash tables, but it took me far longer to make that fix than it should have. I’d hard-coded the knowledge of the data being in assoc lists all over the place and even refactoring the unit tests was a pain.

I made a similar mistake again in Haskell. I was writing a networking protocol server and decided that I’d represent replies to requests as a list of (handle, reply) tuples. I haven’t made the refactoring I want to yet, and it won’t be as painful as the Elisp version, but I still hated myself a little bit for encoding that knowledge in the algorithms that were manipulating replies. What if (as I need to for performance) want to change it to a tree?

When I use OOP, or simply the fact that I’m attaching methods to structs and classes, this never seems to happen. There was nothing stopping me from doing the right thing when I made the above mistakes. It just doesn’t seem to happen when I’m writing methods for a class. That boxing up of code seems to make my code better. It just doesn’t seem to happen to me in C++ or D, but just dropping back to C causes me to write absurdities I wouldn’t have otherwise.

I think (and hope) I’ll be able to notice if I ever make this mistake again in other languages now that I’m aware of it. It’s still surprising to me how, after all these years, it’s still possible, and apparently even likely, for me to make mistakes that I’d call basic.

The danger of (over)mocking

I’ve yet to see a mocking framework I actually want to use. Maybe I’ve just not seen enough use cases for one yet, since right now the number of times I would have found one useful is exactly one. Maybe I’m just not a mockist.

The fundamental problem, for me, of how I’ve seen mocking frameworks used in the wild is that it commits one of the gravest sins of software development: it increases coupling. Our tests shouldn’t care how a function/class/method/etc does its job; it need only care that it does in a verifiable manner. After all, one of the reasons to have tests in the first place is to be able to confidently refactor, and we can’t do that if the tests break even though the code’s behaviour hasn’t changed. Let me give an example: last year I saw someone write a test like this (in Python for clarity, the original was in C++):

class Class(object):
    def inc(self): pass
    def dec(self): pass
    def mul(self, x): pass
    def stuff(self, x):
        self.inc()
        self.inc()
        self.mul(x)
        self.dec()

def test_stuff():
    obj = Class()
    mock = MagicMock()
    obj.inc = obj.dec = obj.mul = mock
    obj.stuff()
    assert mock.mock_calls() == [ call.inc(), call.inc(), call.mul(7), call.dec()]

The methods don’t do anything to keep things simple; in real life they were complicated functions. I consider the test above and anything like it to be a complete and utter waste of time and space. This isn’t just an example of increased coupling; it’s literally rewriting the production code in the test. What information hiding? What interface? I really consider asserting that certain calls were made to be an anti-pattern.

So what’s a developer to do? In the case above (and assuming inc, dec and mul do what we expect to self.value), I’d write this intead:

def test_stuff():
    obj = Class(3)
    assert obj.value == 3
    obj.stuff(4)
    assert obj.value == 19

I’m well aware that in real life things are rarely this clean-cut and legacy codebases sometimes have no way of testing for what it does, but that doesn’t excuse introducing coupling into the codebase. It’s hard to present a realistic example in a blog post, but I’ve written many a test for tangled legacy networking code without resorting to the type of testing in the first example. It’s not easy, but it’s definitely doable.

I’m not even interested in mocking code at all except if it’s slow or isn’t absolutely deterministic. The usual culprits are doing anything with the file system, networking or talking to databases. And that’s for unit testing only. So Class.stuff calls some complicated function; that’s its business and I as a unit tester have no right to go poking in its internals. All I care about is the public API and that the behaviour is right. The question should always be: what’s the contract? If you’re asking questions about how... you’re doing it wrong. A good tester is like a mobster’s wife: you don’t want to know.

Now back to that one example of when I wanted to use a mock: In DConf 2013 there was a mocking presentation that expected the logger to be called with an error message if a method was called a certain way. That appealed to me because months before I’d fixed a bug that was “no error message logged when …”. I fixed it but without an accompanying test, which left a bad taste in my mouth. I just didn’t know how to write the test I needed. A mocking framework would have let me do it easily.

C++ can’t get that static analyser from Microsoft soon enough

One of the main announcements at CppCon 2015 was a tool being developed by Microsoft that will help catch bugs at “compile-time”, the idea being that backwards compatibility won’t be touched but we get help from a tool that understands the code. It can’t come soon enough.

I just started writing C++ again after quite a bit of a hiatus and pretty much immediately suffered from bullet-in-the-foot syndrome. In retrospect (as always) I was doing something stupid but the fact is my bug wouldn’t have happened in a garbage collected language or, I guess, Rust. C++ was more than happy to compile, run, and crash. I wrote something like this (not production code, it was a mock implementation so I could more easily test):

struct Outer;
struct Inner { Outer& outer; };
struct Outer {
    Outer():_inner{&this} {}
    //...
    Inner _inner;
};
void func(const Outer& outer);
int main() {
    Outer outer;
    setThings(outer);
    func(outer);
}

All that was fine. But the setup I needed to do on the outer struct kept growing, a weekend came in between, and I even completely forgot I had a reference to Outer in there (the commented-out ellipsis was 10-20 lines of code). I thought it’d be much better if instead I returned one of these objects from a function, by value, and pass it to func:

Outer createOuter() {
    Outer outer;
    //several lines of setup on outer
    return outer;
}
int main() {
    func(createOuter); //moves happen in here
}

Oh look, it crashed. Not only that, but I’d forgotten to use std::move several times along the way. Once that was fixed, the program was still crashing, but (of course) in a different way. A better way, mind, it was dereferencing an null pointer. But crashing anyway.

If you haven’t seen the stupid mistake I made by now, I’ll tell you what took me 30min-1h to find out: the compiler-generated move constructor (the copy constructor was similarly flawed) copied the Outer pointer from the moved-to object and set the new inner to point to an object that was about to be destroyed… the solution is to write a boilerplate-heavy move contructor to make the new inner point to the new outer. Fun! The reason it worked before is that the object I was passing in was still alive and would continue to stay that way until far after func had returned.

When Microsoft’s tool comes out, I really want to try it on the code above and see what happens. Sure, it worked fine for std::unique_ptr in the demo, but let’s see what happens in real life. In the meanwhile, I wish I was writing in another language.