Why Textbook Statistical Methods aren't as Effective in IT

Published 01 December 2021

If you work with tech, there’s a good chance you’ve come across some of the following statistical tools:

Averages
Standard deviations
t-tests
Least-squares line of best fit

These are the most common tools in a kit that’s typically taught in undergraduate statistics classes and widely used in the outside world. However, this toolkit just isn’t that effective in most IT applications (such as analysing performance benchmarks). Fortunately, there are other tools that do work well. They’re normally taught in “advanced” statistics classes, but I think some of them should become the standard toolkit for tech work (and possibly elsewhere).

In this post I want to talk a bit about why the usual toolkit doesn’t work well. First, let me give an example.

stow: Your Package Manager When You Can't Use Your Package Manager

Published 08 August 2021

Tags: Tools and Ops

GNU stow is an underrated tool. Generically, it helps maintain a unified tree of files that come from different sources. More concretely, I use a bunch of software (D compilers, various tools) that I install manually instead of through my system’s package manager (for various reasons). stow makes that maintainable by letting me cleanly add/remove packages and switch between versions. Here’s how it’s done.

Pricing Yourself as a Contractor 101

Published 04 July 2021

Tags: Careers and Business

I’ve been self-employed for most of my career. Sometimes I talk to other people who are interested in leaving a full-time job to do some kind of contracting or service business. By far, the most common newbie mistake that we all seem to make is in pricing ourselves.

Take this useful blog post that breaks down employee income vs freelancer income in the US. It estimates that you need $140k revenue as a freelancer in the US to have the equivalent of $100k employee compensation. I remember finding calculations like that really useful when I first started a business. However, some people will look at the result and think, “Gee, I have to make 1.4x as much if I’m self employed. Can I really do that?”

No, no, no. That thinking is backwards.

How Real-World Apps Lose Data

Published 06 June 2021

Tags: Reliability , Ops and Anecdotes

A great thing about modern app development is that there are cloud providers to worry about things like hardware failures or how to set up RAID. Decent cloud providers are extremely unlikely to lose your app’s data, so sometimes I get asked what backups are really for these days. Here are some real-world stories that show exactly what.

Reverse Engineering a Docker Image

Published 18 March 2021

Tags: Tools , Ops and Low Level

This started with a consulting snafu: Government organisation A got government organisation B to develop a web application. Government organisation B subcontracted part of the work to somebody. Hosting and maintenance of the project was later contracted out to a private-sector company C. Company C discovered that the subcontracted somebody (who was long gone) had built a custom Docker image and made it a dependency of the build system, but without committing the original Dockerfile. That left company C with a contractual obligation to manage a Docker image they had no source code for. Company C calls me in once in a while to do various things, so doing something about this mystery meat Docker image became my job.

Fortunately, the Docker image format is a lot more transparent than it could be. A little detective work is needed, but a lot can be figured out just by pulling apart an image file. As an example, here’s a quick walkthrough of an image for the Prettier code formatter. (In fact, it’s so easy, there’s a tool for it. Thanks Ezequiel Gonzalez Rial.)

Extending Looped Music for Fun, Relaxation and Productivity

Published 12 March 2021

Tags: Tools , Computer Science , Mathematics and Releases

Some work (like programming) takes a lot of concentration, and I use noise-cancelling headphones to help me work productively in silence. But for other work (like doing business paperwork), I prefer to have quiet music in the background to help me stay focussed. Quiet background music is good for meditation or dozing, too. If you can’t fall asleep or completely clear your mind, zoning out to some music is the next best thing.

The best music for that is simple and repetitive — something nice enough to listen too, but not distracting, and okay to tune out of when needed. Computer game music is like that, by design, so there’s plenty of good background music out there. The harder problem is finding samples that play for more than a few minutes.

So I made loopx, a tool that takes a sample of music that loops a few times, and repeats the loop to make a long piece of music.

When you’re listening to the same music loop for a long time, even slight distortion becomes distracting. Making quality extended music audio out of real-world samples (and doing it fast enough) takes a bit of maths and computer science. About ten years ago I was doing digital signal processing (DSP) programming for industrial metering equipment, so this side project got me digging up some old theory again.

Djinn: A Code Generator and Templating Language Inspired by Jinja2

Published 01 January 2021

Tags: Tools and dlang

Code generators can be useful tools. I sometimes use the command line version of Jinja2 to generate highly redundant config files and other text files, but it’s feature-limited for transforming data. Obviously the author of Jinja2 thinks differently, but I wanted something like list comprehensions or D’s composable range algorithms.

I decided to make a tool that’s like Jinja2, but lets me generate complex files by transforming data with range algorithms. The idea was dead simple: a templating language that gets rewritten directly to D code. That way it supports everything D does, simply because it is D. I wanted a standalone code generator, but thanks to D’s mixin feature, the same templating language works as an embedded templating language (for HTML in a web app, for example). (For more on that trick, see this post about translating Brainfuck to D to machine code all at compile time using mixins.)

As usual, it’s on GitLab. The examples in this post can be found there, too.

Woothee (HTTP User Agent Parser)

Published 14 November 2020

Tags: dlang , Computer Science , Performance and Releases

I’ve written a D implementation of the Project Woothee multi-language HTTP user agent parser. Here are some notes about what it’s useful for, and a few things special about the D implementation.

Pi from High School Maths

Published 26 October 2020

Tags: Mathematics and Computer Science

Warning: I don’t think the stuff in this post has any direct practical application by itself (unless you’re a nuclear war survivor and need to reconstruct maths from scratch or something). Sometimes I like to go back to basics, though. Here’s a look at $\pi$ and areas of curved shapes without any calculus or transcendental functions.

Robust and Race-free Server Logging using Named Pipes

Published 10 October 2020

Tags: Ops , Reliability , Servers , Tools and Systems Design

If you do any server administration work, you’ll have worked with log files. And if your servers need to be reliable, you’ll know that log files are common source of problems, especially when you need to rotate or ship them (which is practically always). In particular, moving files around causes race conditions.

Thankfully, there are better ways. With named pipes, you can have a simple and robust logging stack, with no race conditions, and without patching your servers to support some network logging protocol.