Some Useful Probability Facts for Systems Programming

Published

Tags: and

Probability problems come up a lot in systems programming, and I’m using that term loosely to mean everything from operating systems programming and networking, to building large online services, to creating virtual worlds like in games. Here’s a bunch of rough-and-ready probability rules of thumb that are deeply related and have many practical applications when designing systems.

urllibparse

Published

Tags: , and

TL;DR: I’ve translated Python’s urllib.parse to D for parsing, building and transforming URLs. You can get it from Gitlab.

URL handling is one of those things that most of the time can be done with a regex that mostly works. But sometimes I want a just-works tool when writing D, so I translated Python’s URL handling library. The API isn’t perfect (e.g., the url_split and url_parse distinction is a bit confusing), but it’s been tested against multiple RFCs and had plenty of real-world battle hardening.

My translation is meant to give the same output as Python does, so I’ve translated the Python test suite as well. I don’t plan to add any new features that aren’t in Python.

I hope someone else finds it useful.

The Enterprise Content Management System

Published

A few years ago I worked on the version 2 of some big enterprise’s internal website. A smaller company had the contract, and I’d been subcontracted to deal with deployment and any serverside/backend changes.

The enterprise side had a committee to figure out lists of requirements. Committees are famously bad at coming up with simple and clear specs, and prone to bikeshedding. Thankfully, the company I was contracting with had a project manager who had the job of engaging with the committee for hours each day so that the rest of us didn’t have to. However, we still got a constant stream of inane change requests. (One particular feature of the site changed name three times in about two months.)

It was pretty obvious early on what was happening, so I integrated the existing website backend with a content management system (CMS) that had an admin panel with a friendly WYSIWYG editor. New features got implemented as plugins to the CMS, and old features got migrated as needed. We couldn’t make everything customisable, but eventually we managed to push back on several change requests by saying, “You can customise that whenever you want through the admin panel.”

So, we got things done to satisfaction and delivered, but there was one complication: using the admin panel and WYSIWYG editor. The committee members wouldn’t use it because they were ideas people and didn’t implement anything. The company had IT staff who managed things like websites, but they were hired as technical staff, not for editing website content. On the other hand, they had staff hired for writing copy, but they weren’t hired as website administrators.

So here’s how they ended up using the CMS: CMS data would get rendered as HTML by the website backend, which would then be exported to PDF documents by IT staff. The PDF documents would be converted to Word documents and sent to the writers via email. The writers would edit the documents and send them back to the IT staff, who would do a side-by-side comparison with the originals and then manually enter the changes through the graphical editor in the admin panel. All of the stakeholders were delighted to have a shiny version 2 of the website that had a bunch of new features, was highly customisable, integrated well with their existing processes and was all within budget.

Nowadays, when I’m designing something and I think it’s obvious how it will be used, I remind myself about that CMS and its user-friendly, graphical editor.

Debugging Software Deployments with strace

Published

Tags: , , , and

Translations:русский

Most of my paid work involves deploying software systems, which means I spend a lot of time trying to answer the following questions:

• This software works on the original developer’s machine, so why doesn’t it work on mine?
• This software worked on my machine yesterday, so why doesn’t it work today?

That’s a kind of debugging, but it’s a different kind of debugging from normal software debugging. Normal debugging is usually about the logic of the code, but deployment debugging is usually about the interaction between the code and its environment. Even when the root cause is a logic bug, the fact that the software apparently worked on another machine means that the environment is usually involved somehow.

So, instead of using normal debugging tools like gdb, I have another toolset for debugging deployments. My favourite tool for “Why isn’t this software working on this machine?” is strace.

Object-Oriented Programming and Essential State

Published

Tags:

Translations:中文

Back in 2015, Brian Will wrote a provocative blog post: Object-Oriented Programming: A Disaster Story. He followed it up with a video called Object-Oriented Programming is Bad, which is much more detailed. I recommend taking the time to watch the video, but here’s my one-paragraph summary:

The Platonic ideal of OOP is a sea of decoupled objects that send stateless messages to one another. No one really makes software like that, and Brian points out that it doesn’t even make sense: objects need to know which other objects to send messages to, and that means they need to hold references to one another. Most of the video is about the pain that happens trying to couple objects for control flow, while pretending that they’re decoupled by design.

Overall his ideas resonate with my own experiences of OOP: objects can be okay, but I’ve just never been satisfied with object-orientation for modelling a program’s control flow, and trying to make code “properly” object-oriented always seems to create layers of unneccessary complexity.

There’s one thing I don’t think he explains fully. He says outright that “encapsulation does not work”, but follows it with the footnote “at fine-grained levels of code”, and goes on to acknowledge that objects can sometimes work, and that encapsulation can be okay at the level of, say, a library or file. But he doesn’t explain exactly why it sometimes works and sometimes doesn’t, and how/where to draw the line. Some people might say that makes his “OOP is bad” claim flawed, but I think his point stands, and that the line can be drawn between essential state and accidental state.

Euler's Identity Really is a Miracle, Too

Published

Tags:

A post about the exponential function being a miracle did the rounds recently, and the Hacker News comment thread brought up some debate about the miracle of Euler’s famous identity:

$e^{\pi i} + 1 = 0$

A while back I used to make a living teaching this stuff to high school students and university undergrads. Let me give my personal take on what’s so special about Euler’s identity.

Why const Doesn't Make C Code Faster

Published , Updated

Translations:中文, русский

In a post a few months back I said it’s a popular myth that const is helpful for enabling compiler optimisations in C and C++. I figured I should explain that one, especially because I used to believe it was obviously true, myself. I’ll start off with some theory and artificial examples, then I’ll do some experiments and benchmarks on a real codebase: Sqlite.

Data Still Dominates

Published

Translations: русский

I’m a huge proponent of designing your code around the data, rather than the other way around, and I think it’s one of the reasons git has been fairly successful… I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

Which sounds a lot like Eric Raymond’s “Rule of Representation” from 2003:

Fold knowledge into data, so program logic can be stupid and robust.

Which was just his summary of ideas like this one from Rob Pike in 1989:

Data dominates. If you’ve chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

Which cites Fred Brooks from 1975:

Representation is the Essence of Programming

Beyond craftmanship lies invention, and it is here that lean, spare, fast programs are born. Almost always these are the result of strategic breakthrough rather than tactical cleverness. Sometimes the strategic breakthrough will be a new algorithm, such as the Cooley-Tukey Fast Fourier Transform or the substitution of an n log n sort for an n2 set of comparisons.

Much more often, strategic breakthrough will come from redoing the representation of the data or tables. This is where the heart of your program lies. Show me your flowcharts and conceal your tables, and I shall be continued to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.

So, smart people have been saying this again and again for nearly half a century: focus on the data first. But sometimes it feels like the most famous piece of smart programming advice that everyone forgets.

Let me give some real examples.

Analysing D Code with KLEE

Published

KLEE is symbolic execution engine that can rigorously verify or find bugs in software. It’s designed for C and C++, but it’s just an interpreter for LLVM bitcode combined with theorem prover backends, so it can work with bitcode generated by ldc2. One catch is that it needs a compatible bitcode port of the D runtime to run normal D code. I’m still interested in getting KLEE to work with normal D code, but for now I’ve done some experiments with -betterC D.

Profiling D's Garbage Collection with Bpftrace

Published

Tags: , , and

Recently I’ve been playing around with using bpftrace to trace and profile D’s garbage collector. Here are some examples of the cool stuff that’s possible.