Why Textbook Statistical Methods aren't as Effective in IT

Published 01 December 2021

If you work with tech, there’s a good chance you’ve come across some of the following statistical tools:

Averages
Standard deviations
t-tests
Least-squares line of best fit

These are the most common tools in a kit that’s typically taught in undergraduate statistics classes and widely used in the outside world. However, this toolkit just isn’t that effective in most IT applications (such as analysing performance benchmarks). Fortunately, there are other tools that do work well. They’re normally taught in “advanced” statistics classes, but I think some of them should become the standard toolkit for tech work (and possibly elsewhere).

In this post I want to talk a bit about why the usual toolkit doesn’t work well. First, let me give an example.

Woothee (HTTP User Agent Parser)

Published 14 November 2020

I’ve written a D implementation of the Project Woothee multi-language HTTP user agent parser. Here are some notes about what it’s useful for, and a few things special about the D implementation.

Scaling a GraphQL Website

Published 29 June 2020, Updated 09 December 2020

Tags: Performance , Systems Design , Servers and Me

I normally write abstractly about work I’ve done for other people (for obvious reasons), but I’ve been given permission to write about a website, Vocal, that I did some SRE work on last year. I actually gave a presentation at GraphQL Sydney back in February, but this blog post got delayed a bit.

Vocal is a GraphQL-based website that got traction and hit scaling problems that I got called in to fix. Here’s what I did. Obviously, you’ll find this post useful if you’re scaling another GraphQL website, but most of it’s representative of what you have to deal with when a site first gets enough traffic to cause technical problems. If website scalability is a key interest of yours, you might want to read my recent post about scalability first.

What is a High Traffic Website?

Published 21 April 2020

Tags: Systems Design , Servers and Performance

Terms like “high traffic” are hazardous when designing online services because salespeople, business analysts and engineers all have different perspectives about what they mean. If we’re talking about, say, a high-stakes online poker room, then “high traffic” for the business side will be very low compared to what it is for the technical side. However, all these people will be in a meeting room together making decisions, using the same words to mean different things. It’s obvious how that can lead to bad (and sometimes expensive) choices.

A lot of my day job is talking to business stakeholders and figuring out the technical solutions they need, so this is a problem I have to deal with. So I’ve got my own purely technical way to think about traffic levels for online services.

Why const Doesn't Make C Code Faster

Published 12 August 2019, Updated 24 August 2019

Tags: Performance , C , C++ , Low Level and Programming Languages

Translations:中文, русский

In a post a few months back I said it’s a popular myth that const is helpful for enabling compiler optimisations in C and C++. I figured I should explain that one, especially because I used to believe it was obviously true, myself. I’ll start off with some theory and artificial examples, then I’ll do some experiments and benchmarks on a real codebase: Sqlite.

Profiling D's Garbage Collection with Bpftrace

Published 26 April 2019

Tags: dlang , Performance , Tools and Low Level

Recently I’ve been playing around with using bpftrace to trace and profile D’s garbage collector. Here are some examples of the cool stuff that’s possible.

Why Sorting is O(N log N)

Published 05 January 2019

Tags: Mathematics , Computer Science and Performance

Any decent algorithms textbook will explain how fast sorting algorithms like quicksort and heapsort are, but it doesn’t take crazy maths to prove that they’re as asymptotically fast as you can possibly get.

On Not Optimising for Last Century's Hardware

Published 20 November 2016

Tags: Performance , Software Engineering , dlang and Low Level

Once upon a time I wrote a super-optimised algorithm for rotating data in an array. At least, it was meant to be super-optimised, but its real-world performance turned out to be terrible. That’s because my intuition about performance was stuck in the 20th century:

Break a program down into basic operations like multiplication and assignment
Give each operation a cost (or just slap on an O(1) if you’re lazy)
Add up all the numbers
Try to make the total in step #3 small

A lot of textbooks still teach this “classic” thinking, but except for some highly constrained embedded systems, it just doesn’t work that well on modern hardware.

A Tale of Three Server Caching Architectures

Published 30 July 2016

Tags: Systems Design , Servers , Performance and Software Engineering

Exactly where you put caching in a distributed system has a significant impact on its effectiveness, in ways that aren’t always obvious during the design phase of development.

Offline Compression with Nginx

Published 06 June 2016

Tags: Ops , Servers and Performance

There’s a clear tradeoff with compressing HTTP responses on the fly: compress “harder” and you’ll (hopefully) get a smaller file that takes less time to send over the network – but the net benefit might be negative if the extra work takes too much time, or (when under heavy load) too much CPU. A lot of work has been done analysing this tradeoff, but for static content there’s a neat and simple way to avoid the tradeoff completely: compress offline before serving. Nginx supports this using the gzip_static module.