Data Still Dominates

Published

Tags: and

Here’s a quote from Linus Torvalds in 2006:

I’m a huge proponent of designing your code around the data, rather than the other way around, and I think it’s one of the reasons git has been fairly successful… I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

Which sounds a lot like Eric Raymond’s “Rule of Representation” from 2003:

Fold knowledge into data, so program logic can be stupid and robust.

Which was just his summary of ideas like this one from Rob Pike in 1989:

Data dominates. If you’ve chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

Which cites Fred Brooks from 1975:

Representation is the Essence of Programming

Beyond craftmanship lies invention, and it is here that lean, spare, fast programs are born. Almost always these are the result of strategic breakthrough rather than tactical cleverness. Sometimes the strategic breakthrough will be a new algorithm, such as the Cooley-Tukey Fast Fourier Transform or the substitution of an n log n sort for an n2 set of comparisons.

Much more often, strategic breakthrough will come from redoing the representation of the data or tables. This is where the heart of your program lies. Show me your flowcharts and conceal your tables, and I shall be continued to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.

So, smart people have been saying this again and again for nearly half a century: focus on the data first. But sometimes it feels like the most famous piece of smart programming advice that everyone forgets.

Let me give some real examples.

Hello World Marketing (or, How I Find Good, Boring Software)

Published

Tags: , , , and

Back in 2001 Joel Spolsky wrote his classic essay “Good Software Takes Ten Years. Get Used To it”. Nothing much has changed since then: software is still taking around a decade of development to get good, and the industry is still getting used to that fact. Unfortunately, the industry has investors who want to see hockey stick growth rates on software that’s a year old or less. The result is an antipattern I like to call “Hello World Marketing”. Once you start to notice it, you see it everywhere, and it’s a huge red flag when choosing software tools.

Unfortunately, Garbage Collection isn't Enough

Published

Tags: , , , and

Here’s a little story of some mysterious server failures I had to debug a year ago. The servers would run okay for a while, then eventually start crashing. After that, trying to run practically anything on the machines failed with “No space left on device” errors, but the filesystem only reported a few gigabytes of files on the ~20GB disks.

Using D Features to Reimplement Inheritance and Polymorphism

Published

Tags: , and

Some months ago I showed how inheritance and polymorphism work in compiled languages by reimplementing them with basic structs and function pointers. I wrote that code in D, but it could be translated directly to plain old C. In this post I’ll show how to take advantage of D’s features to make DIY inheritance a bit more ergonomic to use.

Although I have used these tricks in real code, I’m honestly just writing this because I think it’s neat what D can do, and because it helps explain how high-level features of D can be implemented — using the language itself.

How Inheritance and Polymorphism Work

Published

Tags: , , and

I’ve promised to write a blog post about the DIY polymorphic classes implementation in Xanthe, the experimental game I once wrote for bare-metal X86. But first, I decided to write a precursor post that explains how polymorphism and inheritance work in the first place.

Relational Databases Considered Incredibly Useful

Published

Tags: , , and

A year ago I worked on a web service that had Postgres and Elasticsearch as backends. Postgres was doing most of the work and was the primary source of truth about all data, but some documents were replicated in Elasticsearch for querying. Elasticsearch was easy to get started with, but had an ongoing maintenance cost: it was one more moving part to break down, it occasionally went out of sync with the main database, it was another thing for new developers to install, and it added complexity to the deployment, as well as the integration tests. But most of the features of Elasticsearch weren’t needed because the documents were semi-structured, and the search queries were heavily keyword-based. Dropping Elasticsearch and just using Postgres turned out to work okay. No, I’m not talking about brute-force string matching using LIKE expressions (as implemented in certain popular CMSs); I’m talking about using the featureful text search indexes in good modern databases. Text search with Postgres took more work to implement, and couldn’t do all the things Elasticsearch could, but it was easier to deploy, and since then it’s been zero maintenance. Overall, it’s considered a net win (I talked to some of the developers again just recently).

Busywork

Published

Tags: , , , and

My first small business wasn’t actually in the software industry. Back when I was a student, I did contract office jobs in the summer holidays to pay for things while I was studying. I got a feeling that I’d be happier self-employed than someone else’s employee, so after graduation I experimented with registering an Australian Business Number (ABN) and using it to do maths and sciences tuition. I went back to working for other companies eventually, but I learned a lot from the experience, and that know-how was extremely valuable later when I quit my full-time job to start my own little consulting business.

I might write more about that experience some other time, but for now I want to write about what’s been hardest for me to get used to: when you’re self-employed, no one cares how much work you do.

Compression, Complexity and Software System Design

Published

Tags: and

Information theory gives us a way to understand communication systems, by giving us a way to understand what happens to information as it’s transmitted or re-encoded.

We can also study what happens to complexity in a software system as components depend on or interface each other. Just like we can make rigorous arguments about how much information can be compressed, we can make arguments about how much complexity can be simplified, and use this to make better choices when designing software systems.

If You Need Lifeboats, That Means Your Ship is Sinking

Published

Tags: , and

It’s 1912 and Captain Edward Smith is boarding the RMS Titanic. He sees the lifeboats on deck and shakes his head with a heavy sigh before turning to the crew. “In my experience, I’ve never needed lifeboats. They’re not best practices — if you need lifeboats, that means your ship is sinking!” The crew members are enlightened and eagerly throw all lifeboats overboard. The Titanic begins its voyage to New York.

Update

Published

Tags: , , and

Just a quick update because I’ve been too busy to write much recently.

I’m giving a talk at DConf 2017 in Berlin! D’s been growing strongly in the past five years, and DConf’s been growing dramatically since the first one in 2013, so it’s pretty exciting to get involved. No, really. I often give tech talks at no-name events here in Sydney, but I’m half scared I’ll wet my pants on stage with a lineup like this — in my university days, I used to read all the C++ books by Andrei Alexandrescu and Scott Meyer that I could get my hands on.

If you have a DConf ticket, I look foward to seeing you there. If not, then you can look forward to watching the videos :)

Instead of writing a real blog post, I’m dropping a link to this classic about backwards compatibility nightmares, which you might like if you thought the mess that’s x86 BIOS booting was interesting. It’s a chapter from The Old New Thing, a book by Raymond Chen from Microsoft, based on his blog. Raymond Chen has spent a lot of his career making sure new versions of Windows can still run old software, no matter how badly the old software abused APIs and deserved to crash. Most of the technical details belong to the 90s, but there are plenty of morals for software development in the real world today. If you can read that chapter without ever wanting to weep for the industry, you’re stronger than I am.