How Real-World Apps Lose Data

Published 06 June 2021

A great thing about modern app development is that there are cloud providers to worry about things like hardware failures or how to set up RAID. Decent cloud providers are extremely unlikely to lose your app’s data, so sometimes I get asked what backups are really for these days. Here are some real-world stories that show exactly what.

The Enterprise Content Management System

Published 29 December 2019

Tags: Sociotechnology , Software Engineering , Systems Design and Anecdotes

A few years ago I worked on the version 2 of some big enterprise’s internal website. A smaller company had the contract, and I’d been subcontracted to deal with deployment and any serverside/backend changes.

The enterprise side had a committee to figure out lists of requirements. Committees are famously bad at coming up with simple and clear specs, and prone to bikeshedding. Thankfully, the company I was contracting with had a project manager who had the job of engaging with the committee for hours each day so that the rest of us didn’t have to. However, we still got a constant stream of inane change requests. (One particular feature of the site changed name three times in about two months.)

It was pretty obvious early on what was happening, so I integrated the existing website backend with a content management system (CMS) that had an admin panel with a friendly WYSIWYG editor. New features got implemented as plugins to the CMS, and old features got migrated as needed. We couldn’t make everything customisable, but eventually we managed to push back on several change requests by saying, “You can customise that whenever you want through the admin panel.”

So, we got things done to satisfaction and delivered, but there was one complication: using the admin panel and WYSIWYG editor. The committee members wouldn’t use it because they were ideas people and didn’t implement anything. The company had IT staff who managed things like websites, but they were hired as technical staff, not for editing website content. On the other hand, they had staff hired for writing copy, but they weren’t hired as website administrators.

So here’s how they ended up using the CMS: CMS data would get rendered as HTML by the website backend, which would then be exported to PDF documents by IT staff. The PDF documents would be converted to Word documents and sent to the writers via email. The writers would edit the documents and send them back to the IT staff, who would do a side-by-side comparison with the originals and then manually enter the changes through the graphical editor in the admin panel. All of the stakeholders were delighted to have a shiny version 2 of the website that had a bunch of new features, was highly customisable, integrated well with their existing processes and was all within budget.

Nowadays, when I’m designing something and I think it’s obvious how it will be used, I remind myself about that CMS and its user-friendly, graphical editor.

Data Still Dominates

Published 30 June 2019

Tags: Software Engineering , Systems Design and Anecdotes

Translations: русский

Here’s a quote from Linus Torvalds in 2006:

I’m a huge proponent of designing your code around the data, rather than the other way around, and I think it’s one of the reasons git has been fairly successful… I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

Which sounds a lot like Eric Raymond’s “Rule of Representation” from 2003:

Fold knowledge into data, so program logic can be stupid and robust.

Which was just his summary of ideas like this one from Rob Pike in 1989:

Data dominates. If you’ve chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

Which cites Fred Brooks from 1975:

Representation is the Essence of Programming

Beyond craftmanship lies invention, and it is here that lean, spare, fast programs are born. Almost always these are the result of strategic breakthrough rather than tactical cleverness. Sometimes the strategic breakthrough will be a new algorithm, such as the Cooley-Tukey Fast Fourier Transform or the substitution of an n log n sort for an n² set of comparisons.

Much more often, strategic breakthrough will come from redoing the representation of the data or tables. This is where the heart of your program lies. Show me your flowcharts and conceal your tables, and I shall be continued to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.

So, smart people have been saying this again and again for nearly half a century: focus on the data first. But sometimes it feels like the most famous piece of smart programming advice that everyone forgets.

Let me give some real examples.

Unfortunately, Garbage Collection isn't Enough

Published 05 December 2018

Tags: Software Engineering , Programming Languages , Systems Design , Reliability , Ops and Anecdotes

Translations:русский

Here’s a little story of some mysterious server failures I had to debug a year ago. The servers would run okay for a while, then eventually start crashing. After that, trying to run practically anything on the machines failed with “No space left on device” errors, but the filesystem only reported a few gigabytes of files on the ~20GB disks.

The Catch-22 of Risk-Averse Organisations

Published 01 February 2018

Tags: Sociotechnology and Anecdotes

Markets are supposed to make corporations efficient, so if you do consulting, you have to wonder why so many organisations (both private and government) are so absurdly dysfunctional. The way I see it, there’s no real paradox: organisations are pushed by both forces of efficiency (like market forces) and forces of dysfunction (like political drama). Corporations are only efficient if the forces of efficiency are stronger.

There are many forces of dysfunction that can affect an organisation, but there’s one that’s particularly important for risk-averse organisations (like banks and large government departments). Whenever I see people in an organisation doing something that doesn’t make any sense, I always ask if this catch-22 can explain it:

Every risk-averse organisation needs someone to take the initiative to eliminate risks. But in a risk-averse organisation, that’s exactly what no one does.

Busywork

Published 14 September 2017

Tags: Business , Careers , Sociotechnology , Software Engineering and Anecdotes

My first small business wasn’t actually in the software industry. Back when I was a student, I did contract office jobs in the summer holidays to pay for things while I was studying. I got a feeling that I’d be happier self-employed than someone else’s employee, so after graduation I experimented with registering an Australian Business Number (ABN) and using it to do maths and sciences tuition. I went back to working for other companies eventually, but I learned a lot from the experience, and that know-how was extremely valuable later when I quit my full-time job to start my own little consulting business.

I might write more about that experience some other time, but for now I want to write about what’s been hardest for me to get used to: when you’re self-employed, no one cares how much work you do.

The Enterprise Pushbutton

Published 11 May 2016

Tags: Software Engineering , Anecdotes and Low Level

Let’s talk about a hardware driver for a pushbutton. A pushbutton driver isn’t as completely trivial as it might sound because you need debouncing logic to ensure a crisp on/off signal, but it’s hard to imagine how it might need more than about 100 lines of C code.

After working on this particular embedded system, I didn’t need to stress my imagination any more. This pushbutton driver was modelled as an explicit finite state machine, and all the possible states and transitions were specified in a spreadsheet. Then there was a python script that processed this spreadsheet and generated state table data as C code. This was linked to an FSM evaluator in C. The FSM was controlled by a bare-metal driver and triggered callbacks on each state transition.

Most of the callbacks were marked “not yet implemented”. In fact, only two states were even reachable: BUTTON_UP and BUTTON_DOWN. Eventually the entire project was canned, but not because of missing support for BUTTON_TIMEOUT or any of the other states.

The FSM didn’t do any debouncing, so the low-level driver had to do that before passing button up/down events to it.

The Art of Machinery

How Real-World Apps Lose Data

The Enterprise Content Management System

Data Still Dominates

Representation is the Essence of Programming

Unfortunately, Garbage Collection isn't Enough

The Catch-22 of Risk-Averse Organisations

Busywork

The Enterprise Pushbutton

Why Defensive Coding Matters - A War Story