Here’s a quote from Linus Torvalds in 2006:
I’m a huge proponent of designing your code around the data, rather than the other way around, and I think
it’s one of the reasons git has been fairly successful… I will, in fact, claim that the difference between a
bad programmer and a good one is whether he considers his code or his data structures more important. Bad
programmers worry about the code. Good programmers worry about data structures and their relationships.
Which sounds a lot like Eric Raymond’s
“Rule of Representation” from 2003:
Fold knowledge into data, so program logic can be stupid and robust.
Which was just his summary of ideas like this one from Rob
Pike in 1989:
Data dominates. If you’ve chosen the right data structures and organized things well, the algorithms will
almost always be self-evident. Data structures, not algorithms, are central to programming.
Which cites Fred
Brooks from 1975:
Representation is the Essence of Programming
Beyond craftmanship lies invention, and it is here that lean, spare, fast programs are born. Almost always
these are the result of strategic breakthrough rather than tactical cleverness. Sometimes the strategic
breakthrough will be a new algorithm, such as the Cooley-Tukey Fast Fourier Transform or the substitution of an
n log n sort for an n2 set of comparisons.
Much more often, strategic breakthrough will come from redoing the representation of the data or tables.
This is where the heart of your program lies. Show me your flowcharts and conceal your tables, and I shall be
continued to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be
So, smart people have been saying this again and again for nearly half a century: focus on the data first. But
sometimes it feels like the most famous piece of smart programming advice that everyone forgets.
Let me give some real examples.