Some years back I wrote a program in C++ using an obvious object-oriented architecture. Then, later, I had to rewrite it in C, and I learned some pretty good lessons about software design.
The original program is a little hard to explain, so I’ll invent a new problem to use as an example: making a classic hack-and-slash rogue-like game. The player controls a hero in a dungeon fighting monsters like orcs and dragons. How will you model the monsters in code?
If you’ve learned about OOP, this is an obvious application of polymorphism. You can create a
Monster base class, then use inheritance to create subclasses like
Dragon for each type of monster. Unlike some toy examples of
polymorphism, this one is actually 100% valid and works correctly. I’ll call this the code-oriented solution.
Once upon a time I would have accepted that as the obviously proper model and not thought about it any more.
But here’s another solution: create a single
class, and a single
MonsterSpec class. All monsters,
regardless of type, are straightforward instances of
Monster. What makes an orc an orc and a dragon a dragon is a
pointer to an immutable
MonsterSpec instance. All orcs
share a common
MonsterSpec instance that specifies orc
attributes like strength, speed and aggressiveness, as well as orc animations and sound effects, and similarly
MonsterSpec for every other type of
monster. I’ll call this the data-oriented solution.
If all you want to implement is monsters moving around the dungeon and attacking the player, the two solutions have their minor pros and cons, but are mostly equivalent. The difference starts to appear when you add more features. The code-oriented solution solves the problem using code, so other related problems end up being solved with code as well. Likewise, the data-oriented solution makes data the solution to problems.
For example, say you want the game to load pre-designed dungeon maps with monsters. The code-oriented OOP
solution will have you writing a
DragonFactory implementations so that you can create the
appropriate monsters specified in the map data. The data-oriented solution just requires a table that can look up
MonsterSpec instances by name or something. Now you
want to implement saving and loading of games – and hence monsters. With the code-oriented solution, you’ll end
up writing serialisation and deserialisation code for each subclass of
Monster. With the data-oriented solution, you just need a way to
Monster instance. Now (the big one)
suppose you want dungeon maps to be able to create new dungeon-specific monsters. The code-oriented solution
requires a plugin system, and a way for map data generation to be integrated with the code build system, so that
map data can stay binary compatible with the code. The data-oriented solution just needs the specs for the new
So, Data-Oriented Solutions Win?
I deliberately picked this example to highlight the mistake I originally made: jumping to a model that
embodied the solution in code (i.e., a class hierarchy) when most of the problems were about data. As a
counterpoint, if the game needed each monster to have complex, specialised AI, then that would be a
problem of code. (Though it’s popular in real game development to embed a scripting language so that behaviour
can be data.) A hybrid approach is possible – if you know
about how polymorphism is implemented, you might notice that
MonsterSpec is just like a vtable that contains data instead of
function pointers (and why not have both?).
When using a language like C, a drawback of data-oriented architecture is that code can be statically checked but data can’t, so the data language needs to be simple. Not all languages have the same hard dichotomy of code-is-rigid-data-is-flexible. At the extreme is Lisp, where data is code and code is data – with the downside that not even code can be statically checked. Other languages have run-time reflection (e.g., Java) or compile-time reflection (e.g., D). But being conscious of code versus data is important for simple, maintainable software.