Some years back I wrote a program in C++ using an obvious object-oriented architecture. Then, later, I had to rewrite it in C, and I learned some pretty good lessons about software design.
The Problem
The original program is a little hard to explain, so I’ll invent a new problem to use as an example: making a classic hack-and-slash rogue-like game. The player controls a hero in a dungeon fighting monsters like orcs and dragons. How will you model the monsters in code?
Two Solutions
If you’ve learned about OOP, this is an obvious application of polymorphism. You can create a Monster
base class, then use inheritance to create subclasses like
Orc
and Dragon
for each type of monster. Unlike some toy examples of
polymorphism, this one is actually 100% valid and works correctly. I’ll call this the code-oriented solution.
Once upon a time I would have accepted that as the obviously proper model and not thought about it any more. But
here’s another solution: create a single Monster
class, and a
single MonsterSpec
class. All monsters, regardless of type,
are straightforward instances of Monster
. What makes an orc
an orc and a dragon a dragon is a pointer to an immutable MonsterSpec
instance. All orcs share a common MonsterSpec
instance that specifies orc attributes like strength, speed
and aggressiveness, as well as orc animations and sound effects, and similarly there’s another MonsterSpec
for every other type of monster. I’ll call this the
data-oriented solution.
Comparison
If all you want to implement is monsters moving around the dungeon and attacking the player, the two solutions have their minor pros and cons, but are mostly equivalent. The difference starts to appear when you add more features. The code-oriented solution solves the problem using code, so other related problems end up being solved with code as well. Likewise, the data-oriented solution makes data the solution to problems.
For example, say you want the game to load pre-designed dungeon maps with monsters. The code-oriented OOP solution
will have you writing a MonsterFactory
with OrcFactory
and DragonFactory
implementations so that you can create the appropriate
monsters specified in the map data. The data-oriented solution just requires a table that can look up MonsterSpec
instances by name or something. Now you want to implement
saving and loading of games – and hence monsters. With the code-oriented solution, you’ll end up writing serialisation
and deserialisation code for each subclass of Monster
. With
the data-oriented solution, you just need a way to (de)serialise a Monster
instance. Now (the big one) suppose you want dungeon maps to be
able to create new dungeon-specific monsters. The code-oriented solution requires a plugin system, and a way for map
data generation to be integrated with the code build system, so that map data can stay binary compatible with the code.
The data-oriented solution just needs the specs for the new monster.
So, Data-Oriented Solutions Win?
I deliberately picked this example to highlight the mistake I originally made: jumping to a model that embodied the
solution in code (i.e., a class hierarchy) when most of the problems were about data. As a counterpoint, if the game
needed each monster to have complex, specialised AI, then that would be a problem of code. (Though it’s
popular in real game development to embed a scripting language so that behaviour can be data.) A hybrid approach is
possible – if you know about how polymorphism is
implemented, you might notice that MonsterSpec
is just
like a vtable that contains data instead of function pointers (and why not have both?).
When using a language like C, a drawback of data-oriented architecture is that code can be statically checked but data can’t, so the data language needs to be simple. Not all languages have the same hard dichotomy of code-is-rigid-data-is-flexible. At the extreme is Lisp, where data is code and code is data – with the downside that not even code can be statically checked. Other languages have run-time reflection (e.g., Java) or compile-time reflection (e.g., D). But being conscious of code versus data is important for simple, maintainable software.