Robust and Race-free Server Logging using Named Pipes

If you do any server administration work, you’ll have worked with log files. And if your servers need to be reliable, you’ll know that log files are common source of problems, especially when you need to rotate or ship them (which is practically always). In particular, moving files around causes race conditions.

Thankfully, there are better ways. With named pipes, you can have a simple and robust logging stack, with no race conditions, and without patching your servers to support some network logging protocol.

The problems with rotating log files

First, let’s talk about the problems. Race conditions are generally a problem with popular file-based logging setups, whether you’re rotating logs into archival storage, or shipping them to a remote log processing stack, or whatever. To keep things concrete, though, let me talk about logrotate, just because it’s a popular tool.

Say you have a log file at /var/log/foo. It gets pretty big, and you want to process the logs periodically and start with a new, empty file. So you (or your distro maintainers) set up logrotate with various rules about when to rotate the file.

By default, logrotate will rename the file (to something like /var/log/foo.1) and create a new /var/log/foo to write to. That (mostly) works for software that runs intermittently (such as a package manager that does software updates). But it won’t do any good if the log file is generated by a long-running server. The server only uses the filename when it opens the file; after that it just keeps writing to its open file descriptor. That means it will keep writing to the old file (now named /var/log/foo.1), and the new /var/log/foo file will stay empty.

To handle this use-case, logrotate supports another mode: copytruncate. In this mode, instead of renaming, logrotate will copy the contents of /var/log/foo to an archival file, and then truncate the original file to zero length. As long as the server has the log file open in append mode, it will automatically write new logs to the start of the file, without needing to detect the truncation and do a file seek (the kernel handles that).

That copytruncate mode creates a race condition, though. Any log lines that are written after the copy but before the truncation will get destroyed. Actually, you tend to get the same race condition even with the default move-and-create mode. That’s because there’s not much point just splitting up the logs into multiple files. Most systems are configured to do something like compress the old log file, but ultimately you need to delete the old, uncompressed data, which creates the same race as truncating. (In practice, this race isn’t so likely for occasional log writers, like package managers, and the delay flag to logrotate makes it rarer, albeit by making the log handling a bit more complicated.)

Some servers, like Nginx, support a modification of the default logrotate mode:

Rename the old file
Create the new file
(New step) notify the server that it needs to reopen its log file.

This works (as long as the logs processor doesn’t delete the old file before the server has finished reopening), but it requires special support from the server, and you’re out of luck with most software. There’s a lot of software out there, and log file handling just isn’t interesting enough to get high on the to-do list. This approach also only works for long-running servers.

I think this is a good point to stop and take a step back. Having multiple processes juggle log files around on disk without any synchronisation is just an inherently painful way to do things. It causes bugs and makes logging stacks complicated (here’s just one of many examples). One alternative is to use some network protocol like MQTT or networked syslog, but, realistically, most servers won’t support the one you want. And they shouldn’t have to — log files are a great interface for log writers.

That’s okay because *nix “everything is a file” lets us easily get a file interface on the writer side, with a streaming interface on the reader side.

Named pipes 101

Maybe you’ve seen pipes in pipelines like this:


$ sort user_log.txt | uniq

The pipe connecting sort and uniq is a temporary, anonymous communication channel that sort writes to and uniq reads from. Named pipes are less common, but they’re also communication channels. The only difference is that they persist on the filesystem as if they were files.

Open up a terminal and cd into some temporary working directory. The following creates a named pipe and uses cat to open a writer:


$ mkfifo p
$ # This cat command will sit waiting for input
$ cat > p

Leave that cat command waiting, and open up another terminal in the same directory. In this terminal, start your reader:


$ # This will sit waiting for data to come over the pipe
$ cat p

Now as you type things into the writer end, you’ll see them appear in the reader end. cat will use line buffering in interactive mode, so data will get transferred every time you start a new line.

cat doesn’t have to know anything about pipes for this to work — the pipe acts like a file as long as you just naïvely read or write to it. But if you check, you’ll see the data isn’t stored anywhere. You can pump gigabytes through a pipe without filling up any disk space. Once the data has been read once, it’s lost. (You can have multiple readers, but only one will receive any buffer-load of data.)

Another thing that makes pipes useful for communication is their buffering and blocking. You can start writing before any readers open the pipe, and data gets temporarily buffered inside the kernel until a reader comes along. If the reader starts first, its read will block, waiting for data from the writer. (The writer will also block if the pipe buffer gets full.) If you try the two-terminal experiment again with a regular file, you’ll see that the reader cat will eagerly read all the data it can and then exit.

An annoying problem and a simple solution

Maybe you’re seeing how named pipes can help with logging: Servers can write to log “files” that are actually named pipes, and a logging stack can read log data directly from the named pipe without letting a single line fall onto the floor. You do whatever you want with the logs, without any racey juggling of files on disk.

There’s one annoying problem: the writer doesn’t need a reader to start writing, but if a reader opens the pipe and then closes it, the writer gets a SIGPIPE (“broken pipe”), which will kill it by default. (Try killing the reader cat while typing things into the writer to see what I mean.) Similarly, a reader can read without a writer, but if a writer opens the pipe and then closes it, that will be treated like an end of file. Although the named pipe persists on disk, it isn’t a stable communication channel if log writers and log readers can restart (as they will on a real server).

There’s a solution that’s a bit weird but very simple. Multiple processes can open the pipe for reading and writing, and the pipe will only close when all readers or all writers close it. All we need for a stable logging pipe is a daemon that holds the named pipe open for both reading and writing, without doing any actual reading or writing. I set this up on my personal server last year, and I wrote a tiny, zero-config program to act as my pipe-holding daemon. It just opens every file in its current working directory for both reading and writing. I run it from a directory that has symbolic links to every named pipe in my logging stack. The program runs in a loop that ends in a wait() for a SIGHUP. If I ever update the symlinks in the directory, I give the daemon a kill -HUP and it reopens them all. Sure, it could do its own directory watching, but the SIGHUP approach is simple and predictable, and the whole thing works reliably. Thanks to the pipe buffer, log writers and log readers can be shut down and restarted independently, any time, without breakage.

My server uses the s6 supervision suite to manage daemons, so I have s6-log reading from each logging pipe. The bottom part of the s6-log documentation page has some good insights into the problems with popular logging systems, and good ideas about better ways to do things.

Imagine: a world without log rotation

Strictly speaking, named pipes aren’t necessary for race-free logs processing. The s6 suite encourages writing logs to some file descriptor (like standard error), and letting the supervision suite make sure those file descriptors point to something useful. However, the named pipe approach adds a few benefits:

It doesn’t require any co-ordination between writer and reader
It integrates nicely with the software we have today
It gives things meaningful names (rather than /dev/fd/4)

I’ve worked with companies that spend about as much on their logging stacks as on their serving infrastructure, and, no, “we do logs processing” isn’t in their business models. Of course, log rotation and log shipping aren’t the only problems to blame, but it feels so wrong that we’ve made logs so complicated. If you work on any logging system, consider if you really need to juggle log files around. You could be helping to make the world a better place.