Code Versus Data — The Art of Machinery

Some years back I wrote a program in C++ using an obvious object-oriented architecture. Then, later, I had to rewrite it in C, and I learned some pretty good lessons about software design.

The Problem

The original program is a little hard to explain, so I’ll invent a new problem to use as an example: making a classic hack-and-slash rogue-like game. The player controls a hero in a dungeon fighting monsters like orcs and dragons. How will you model the monsters in code?

Two Solutions

If you’ve learned about OOP, this is an obvious application of polymorphism. You can create a Monster base class, then use inheritance to create subclasses like Orc and Dragon for each type of monster. Unlike some toy examples of polymorphism, this one is actually 100% valid and works correctly. I’ll call this the code-oriented solution.

Once upon a time I would have accepted that as the obviously proper model and not thought about it any more. But here’s another solution: create a single Monster class, and a single MonsterSpec class. All monsters, regardless of type, are straightforward instances of Monster. What makes an orc an orc and a dragon a dragon is a pointer to an immutable MonsterSpec instance. All orcs share a common MonsterSpec instance that specifies orc attributes like strength, speed and aggressiveness, as well as orc animations and sound effects, and similarly there’s another MonsterSpec for every other type of monster. I’ll call this the data-oriented solution.

Comparison

If all you want to implement is monsters moving around the dungeon and attacking the player, the two solutions have their minor pros and cons, but are mostly equivalent. The difference starts to appear when you add more features. The code-oriented solution solves the problem using code, so other related problems end up being solved with code as well. Likewise, the data-oriented solution makes data the solution to problems.

For example, say you want the game to load pre-designed dungeon maps with monsters. The code-oriented OOP solution will have you writing a MonsterFactory with OrcFactory and DragonFactory implementations so that you can create the appropriate monsters specified in the map data. The data-oriented solution just requires a table that can look up MonsterSpec instances by name or something. Now you want to implement saving and loading of games – and hence monsters. With the code-oriented solution, you’ll end up writing serialisation and deserialisation code for each subclass of Monster. With the data-oriented solution, you just need a way to (de)serialise a Monster instance. Now (the big one) suppose you want dungeon maps to be able to create new dungeon-specific monsters. The code-oriented solution requires a plugin system, and a way for map data generation to be integrated with the code build system, so that map data can stay binary compatible with the code. The data-oriented solution just needs the specs for the new monster.

So, Data-Oriented Solutions Win?

I deliberately picked this example to highlight the mistake I originally made: jumping to a model that embodied the solution in code (i.e., a class hierarchy) when most of the problems were about data. As a counterpoint, if the game needed each monster to have complex, specialised AI, then that would be a problem of code. (Though it’s popular in real game development to embed a scripting language so that behaviour can be data.) A hybrid approach is possible – if you know about how polymorphism is implemented, you might notice that MonsterSpec is just like a vtable that contains data instead of function pointers (and why not have both?).

When using a language like C, a drawback of data-oriented architecture is that code can be statically checked but data can’t, so the data language needs to be simple. Not all languages have the same hard dichotomy of code-is-rigid-data-is-flexible. At the extreme is Lisp, where data is code and code is data – with the downside that not even code can be statically checked. Other languages have run-time reflection (e.g., Java) or compile-time reflection (e.g., D). But being conscious of code versus data is important for simple, maintainable software.