The D programming language has a bunch of built-in attributes like pure and nothrow. I was wondering how things like libraries might break if function attributes changed between versions, so I gave it a try.

The Problem

Let’s say I have this library code:

module thing;

// Thing version 0.1

string foo()
{
        return "The answer is";
}

auto bar()
{
        return 42;
}

And I make a shared library out of it:


$ dmd -shared thing.d
$ ls
thing.d  thing.o  thing.so

Then I write a program using this useful library:

import thing;
import io = std.stdio;

void main()
{
        io.writeln(foo());
        io.writeln(bar());
}

And compile it, dynamically linking to the thing library:


$ dmd -L./thing.so app.d
$ ./app
The answer is
42

So far, so good. But now I write a new version of the library that takes advantage of D’s function attributes:

module thing;

// Thing version 0.2

string foo() pure nothrow @nogc
{
        return "The answer is";
}

// TODO: maybe add some attributes here 
auto bar()
{
        return 42;
}

I compile this to make a new version of the shared library:


$ dmd -shared thing.d
$ # app uses dynamic linking, so I shouldn't have to recompile it, right?
$ ./app
./app: symbol lookup error: ./app: undefined symbol: _D5thing3fooFZAya

What Went Wrong?

This is an example of a problem often called ABI instability (“ABI” referring to the Application Binary Interface – the interface one binary file, such as the library, uses to interact with another, such as the executable). It’s a problem for all languages that use dynamic linking or loading (here’s an in-depth guide for C++). It’s well known that changing a library’s API can break third-party code, and this is just the same problem from a lower-level perspective. Just to be clear, changing things that are only used internally by the library wouldn’t cause problems at either level.

Most D projects in 2016 are compiled all at once and use static linking, so this kind of problem doesn’t happen much. However, like some other D programmers, I think dynamic linking and binary compatibility is going to matter. I want to install D libraries as system libraries and use D in very large projects. That’s why I tried this experiment.

Running app again failed with a symbol lookup error. Because app is dynamically linked, it doesn’t contain the definition of functions like foo and bar that it uses from the thing library. The compiler inserts the names (called “symbols”) of the missing things into the executable, so they can be looked up when needed. The symbol name _D5thing3fooFZAya is a little cryptic, but you can probably see that app failed to find foo from the thing library.

Let’s look at what symbols are available from version 0.2 of the thing library:


$ nm thing.so 
0000000000000a60 t 
0000000000201008 d DW.ref.__dmd_personality_v0
0000000000000b60 R _D5thing12__ModuleInfoZ
0000000000000ad8 T _D5thing15__unittest_failFiZv
0000000000000a78 T _D5thing3barFNaNbNiNfZi
0000000000000a60 T _D5thing3fooFNaNbNiZAya
0000000000000a88 T _D5thing7__arrayZ
0000000000000ab0 T _D5thing8__assertFiZv
0000000000200d50 d _DYNAMIC
0000000000200f90 d _GLOBAL_OFFSET_TABLE_
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
                 w _Jv_RegisterClasses
0000000000000b50 r _TMP0
0000000000000b6e r _TMP1
0000000000201018 d __TMC_END__
0000000000201020 B __bss_start
                 w __cxa_finalize@@GLIBC_2.2.5
                 U __dmd_personality_v0
0000000000201000 d __dso_handle
                 w __gmon_start__
0000000000201020 d __start_deh
0000000000201018 d __start_minfo
0000000000201020 d __stop_deh
0000000000201020 d __stop_minfo
                 U _d_arraybounds
                 U _d_assert
                 U _d_dso_registry
                 U _d_unittest
0000000000201020 D _edata
0000000000201030 B _end
0000000000000b40 T _fini
00000000000008c8 T _init

Most of these symbols are part of things like the D and C runtimes, but the fifth and sixth entries are symbols for the two functions exported by the thing library. It looks like the foo function now has the symbol _D5thing3fooFNaNbNiZAya which doesn’t match the symbol app was looking for: _D5thing3fooFZAya.

By the way, this weird naming is what’s called “mangling”. In the simpler world of C, a function called foo would get exported as the symbol foo. But in more complex languages like D and C++, it’s possible to have multiple functions called foo as long as they’re in different modules or namespaces, or have different argument types. Mangling is the process of encoding information into the symbol name so that the right implementation can be found.

If you look at the spec for D’s mangling scheme, you might notice that it includes function attributes. The new symbol for foo includes NaNbNi, which means pure, nothrow and nogc. app was compiled against the old version of the library with the old mangled name for foo, but the new library has a different name.

This is a nuisance, but it’s better than the situation in C/C++. For C++ this stuff isn’t standardised, but getting a linker error like with app is the lucky case – otherwise your program will try to run and be potentially buggy/unstable because of incompatibilities. C has no mangling of type information, so changing types of exported symbols is just plain unsafe.

How Can This be Fixed?

So, the new version of the library broke binary compatibility. The simplest fix is to just recompile app. If everything’s compiled together against the same source code, there shouldn’t be any compatibility problems. Obviously this might defeat the purpose of having separate binaries, though.

Another option is to add the missing symbol as an alias of the new symbol when building the shared library:


$ dmd -shared -L--defsym=_D5thing3fooFZAya=_D5thing3fooFNaNbNiZAya thing.d
$ # Look, Ma, no recompilation of app!
$ ./app
The answer is
42

The symbol aliases can be put into a separate file:


$ cat symbols
_D5thing3fooFZAya = _D5thing3fooFNaNbNiZAya;
$ dmd -shared -L--just-symbols=symbols thing.d
$ # Still no recompilation of app!
$ ./app
The answer is
42

It would be nice to have a way to solve the problem inside the library itself, though. D doesn’t have an equivalent to GCC’s alias attribute, but it’s possible to override the name mangling of a symbol:

module thing;

// Thing version 0.2

string foo() pure nothrow @nogc
{
        return "The answer is";
}

pragma(mangle, "_D5thing3fooFZAya")
deprecated("Only for binary compatibility with v0.1")
string foo_0_1()
{
        return foo();
}

// TODO: maybe add some attributes here 
auto bar()
{
        return 42;
}

Now the new library version works with old executables even when compiled normally:


$ dmd -shared thing.d
$ ./app
The answer is
42

The standard library has mangling functions that can help with getting the right mangled names.

Doing it this way adds an extra hop in calling foo for old executables. In theory, a compliant D compiler is allowed to automatically implement the alias, first by inlining the function, then by deduping the identical function bodies. I’m not expecting to rely on that happening any time soon.

A major advantage over the linker hackery is that it maintains D’s type safety. The above code compiles because it’s safe to call a pure function even if you don’t expect it to be pure. If version 0.2 took away attributes, the same trick wouldn’t compile, and that’s a good thing because it wouldn’t be safe.

auto can be Hazardous

On that note, what if I refactor bar in version 0.3?

module thing;

// Thing version 0.3

string foo() pure nothrow @nogc
{
        return "The answer is";
}

pragma(mangle, "_D5thing3fooFZAya")
deprecated("Only for binary compatibility with v0.1")
string foo_0_1()
{
        return foo();
}

auto bar()
{
        // Using a global variable now for some reason
        return _the_number;
}

private:

int _the_number = 42;

The answer is that I break binary compatibility of bar:


$ dmd -shared thing.d
$ ./app
./app: symbol lookup error: ./app: undefined symbol: _D5thing3barFNaNbNiNfZi

The gotcha is that functions that return auto have function attributes inferred automatically. The original function was pure, but my (dubious) refactoring to use mutable global state took that away. This time it isn’t just a naming problem. Code that was compiled under the promise of a pure bar isn’t guaranteed to work with the new version of bar, even if the symbols are forced to match. This is why attributes are a part of the mangling spec in the first place.

Summary