Completely Ripping the Runtime out of D

Update: a lot of this information is already outdated (good news!). See my latest update, and my second update.

Most high level languages are built as a layer on top of C. That includes out-of-the-box D, but it doesn’t have to be that way: D is a plausible candidate for a “better C”. I think this is a pretty cool idea, so I’ve been experimenting with it to see what’s possible. The dmd compiler (and very soon the ldc2 compiler) has a -betterC command line flag that’s intended to remove dependencies on the D runtime. Unfortunately, it’s still extremely rudimentary — the docs only promise it “omit[s] generating some runtime information and helper functions” — so in practice it’s hard to write non-trivial D code without getting runtime dependencies, even if you don’t need them in theory.

With a little linker hacking, it’s possible to rip these unnecessary dependencies out of compiled D code. As an example, I’ll completely remove all references to the D runtime out of some compiled D code so that it can link directly to some C, as if it were C code to begin with.

Disclaimers

I consider this an experimental hack until there’s more official compiler support. I just hope it helps more D programmers experiment with the better C concept, so that we can develop an even better better C :)

Also, the D runtime appears in recurring flamewars about D, so I’ll have to say it: I think the runtime is okay for most applications. It just might be necessary to remove it when doing certain types of systems programming. Even then, there are useful compromises between the extremes of “no D runtime” and “full D runtime”.

And if you’ve found this page after searching for a linker error you got compiling some normal D code, sorry, this probably won’t be your solution. I recommend asking on the Dlang forums instead.

Finally, I’m doing this with dmd, on a 64b GNU/Linux system with PIC and stack canaries. Other systems will be similar because the D ABI is reasonably well specced out (especially compared to C++’s ABI) but some things might still not be portable.

What’s Lost

I’m ripping the runtime out hard, here, so I’ll lose a number of D features. PowerNex, a kernel written in D, ports a subset of the D runtime to preserve some functionality, but I won’t do that at all, to get a kind of baseline.

GC is out, of course. This affects some features like dynamic array concatenation and closures.

D classes just won’t work without replacing at least some of object.d in the runtime library. That doesn’t bother me much because I don’t rely on classes for programming. (Apparently C++ classes are an alternative.)

D’s runtime type information is based on TypeInfo classes, so that has to go, too. If you’re doing a “better C” coding style, you’re probably not going to miss that, either. Unfortunately, for legacy reasons, the runtime itself is a heavy user of TypeInfo, so the compiler will inject Typeinfo dependencies into code. For example, array comparison is implemented using TypeInfo-based reflection, even when the elements are plain old data and a simple memcmp is enough. This isn’t hard to work around by removing the TypeInfo-based implementations and reimplementing things as needed, but it’s a nuisance. Hopefully this situation should improve relatively quickly because there are performance benefits even for code that isn’t -betterC.

Exceptions are classes and also use TypeInfo, so they’re out. Even if you port enough object.d code to support exceptions, the idiomatic usage of exceptions requires GC allocation. Auburn Sounds have documented a workaround. Many D developers are already interested in implementing GC-less exceptions after fixes to @safe, scope and reference counting are finished.

Standard D assertions don’t work out of the box. On the other hand, they’re automatically removed from release builds, so I’ve taken up the pattern of linking my test code to the D runtime and building my runtime-less code with -release.

I haven’t experimented with thread-local storage, but I’m okay with making all global data immutable or shared, anyway.

Initialisers (for modules and static data) that normally run before D’s main won’t run. I might have a try getting them to run sometime, but for now they’re out. All data will need to be either purely constructed at compile time, or explicitly constructed at runtime (or left with the default zeroed value).

The Phobos standard library is a bit tricky. Some of it usable, but a lot isn’t. Exceptions are a major blocker.

In case this all sounds too depressing, here are some things that we still have compared to C, even with 100% of the D runtime removed:

A better, stronger type system. C’s type system has a few dark corners, especially around pointers and arrays. C’s enumerated type values fill a global namespace and have no type safety.
Slices. These still work without GC (except for things like concatenation), and they’re a much less error-prone way to handle chunks of memory than plain pointers.
Simple delegates. (It’s only full-featured closures that don’t work.)
Compile-time reflection and metaprogramming.
(Surprisingly) more low-level control without vendor-specific pragmas. (Standard D offers ways to specify data alignment, for example.)
Modules.

Doing the Surgery

Here’s some horrible, over-engineered sample code. It has one public function, count(), which returns a number that goes up by one every time it’s called. That’s the only thing that’s extern(C); everything else is used internally and is normal D. The unit test doesn’t work when linked to bare C, but there’s nothing stopping us putting it in the code anyway and running it in a test build.

module count;

@nogc:
nothrow:

import core.atomic : atomicOp, atomicLoad;

extern(C)
{
        int count()
        {
                scope(exit) counter.addOne();
                return counter.getValue();
        }
}

private:

shared struct AtomicCounter(T)
{
        void addOne() pure
        {
                atomicOp!"+="(_v, 1);
        }

        int getValue() const pure
        {
                return atomicLoad(_v);
        }

        private:
        T _v;
}

unittest
{
        shared test_counter = AtomicCounter!int(42);
        assert (test_counter.getValue() == 42);
        test_counter.addOne();
        assert (test_counter.getValue() == 43);
}

shared counter = AtomicCounter!int(1);

Here’s some simple C code that’ll use this awesome functionality to count to 10:

#include <stdio.h>

int count();  // From the D code

int main()
{
    int j;
    for (j = 0; j < 10; j++)
    {
        printf("%d\n", count());
    }
    return 0;
}

First, let’s compile the D code to an object file, and then try naïvely linking it with the C code:


$ dmd -w -betterC -release -c count.d
$ gcc -Wall program.c count.o                                                                                             
count.o:(.data.DW.ref.__dmd_personality_v0+0x0): undefined reference to `__dmd_personality_v0'
count.o:(.data._D11TypeInfo_Oi6__initZ+0x0): undefined reference to `_D15TypeInfo_Shared6__vtblZ'
count.o:(.data._D11TypeInfo_Oi6__initZ+0x10): undefined reference to `_D10TypeInfo_i6__initZ'
count.o:(.data._D54TypeInfo_S5count21__T13AtomicCounterTiZ13AtomicCounter6__initZ+0x0): undefined reference to `_D15TypeInfo_Struct6__vtblZ'
count.o:(.text.d_dso_init[.data.d_dso_rec]+0x32): undefined reference to `_d_dso_registry'
collect2: error: ld returned 1 exit status

Okay, that’s a bunch of errors from the linker trying to find things from the D runtime. The usual way to fix this would be to do the compilation the other way around (compile C code to object files with gcc first, then let dmd put everything together and link in the D runtime). Of course, I’m not going to do that because I don’t want the runtime. Let’s take a closer look at what linker symbols are inside count.o:


$ nm count.o
0000000000000000 t 
0000000000000000 V DW.ref.__dmd_personality_v0
                 U _D10TypeInfo_i6__initZ
0000000000000000 V _D11TypeInfo_Oi6__initZ
                 U _D15TypeInfo_Shared6__vtblZ
                 U _D15TypeInfo_Struct6__vtblZ
0000000000000000 W _D4core6atomic24__T14atomicFetchAddTiTiZ14atomicFetchAddFNaNbNiKOiiZi
0000000000000000 W _D4core6atomic28__T8atomicOpVAyaa2_2b3dTiTiZ8atomicOpFNaNbNiKOiiZi
0000000000000000 W _D4core6atomic36__T28atomicValueIsProperlyAlignedTiZ28atomicValueIsProperlyAlignedFNaNbNiNfmZb
0000000000000000 W _D4core6atomic47__T10atomicLoadVE4core6atomic11MemoryOrderi3TiZ10atomicLoadFNaNbNiKOxiZi
0000000000000000 V _D54TypeInfo_S5count21__T13AtomicCounterTiZ13AtomicCounter6__initZ
0000000000000000 V _D5count21__T13AtomicCounterTiZ13AtomicCounter6__initZ
0000000000000000 W _D5count21__T13AtomicCounterTiZ13AtomicCounter6addOneMOFNaNbNiZv
0000000000000000 W _D5count21__T13AtomicCounterTiZ13AtomicCounter8getValueMOxFNaNbNiZi
0000000000000000 D _D5count7counterOS5count21__T13AtomicCounterTiZ13AtomicCounter
                 U _GLOBAL_OFFSET_TABLE_
                 U _Unwind_Resume
                 U __dmd_personality_v0
                 U __start_deh
                 U __start_minfo
                 U __stop_deh
                 U __stop_minfo
                 U _d_dso_registry
0000000000000000 T count

The symbols marked U are things that are missing and need to be pulled in externally at link time. The global offset table is for PIC and is recognised by gcc, and _Unwind_Resume is also recognised by gcc, but the other things are from the D runtime, and we need to get rid of these dependencies. (Web search engines and the D runtime source code are good for identifying these symbols. I also found this list of runtime functions recently, too.) We can’t just remove the symbols, of course, we need to remove the things that depend on those symbols — i.e., the relocations. Let’s take a look at them:


$ objdump -r count.o

count.o:     file format elf64-x86-64

RELOCATION RECORDS FOR [.text]: (none)

RELOCATION RECORDS FOR [.data]: (none)

RELOCATION RECORDS FOR [.eh_frame]:
OFFSET           TYPE              VALUE 
0000000000000013 R_X86_64_PC32     DW.ref.__dmd_personality_v0
0000000000000028 R_X86_64_PC32     .text.count
0000000000000031 R_X86_64_PC32     .gcc_except_table
0000000000000048 R_X86_64_PC32     .text._D5count21__T13AtomicCounterTiZ13AtomicCounter6addOneMOFNaNbNiZv
0000000000000051 R_X86_64_PC32     .gcc_except_table+0x0000000000000010
0000000000000068 R_X86_64_PC32     .text._D5count21__T13AtomicCounterTiZ13AtomicCounter8getValueMOxFNaNbNiZi
0000000000000071 R_X86_64_PC32     .gcc_except_table+0x000000000000001c
0000000000000088 R_X86_64_PC32     .text._D4core6atomic28__T8atomicOpVAyaa2_2b3dTiTiZ8atomicOpFNaNbNiKOiiZi
0000000000000091 R_X86_64_PC32     .gcc_except_table+0x0000000000000028
00000000000000a8 R_X86_64_PC32     .text._D4core6atomic24__T14atomicFetchAddTiTiZ14atomicFetchAddFNaNbNiKOiiZi
00000000000000b1 R_X86_64_PC32     .gcc_except_table+0x0000000000000034
00000000000000c8 R_X86_64_PC32     .text._D4core6atomic36__T28atomicValueIsProperlyAlignedTiZ28atomicValueIsProperlyAlignedFNaNbNiNfmZb
00000000000000d1 R_X86_64_PC32     .gcc_except_table+0x0000000000000040
00000000000000e8 R_X86_64_PC32     .text._D4core6atomic47__T10atomicLoadVE4core6atomic11MemoryOrderi3TiZ10atomicLoadFNaNbNiKOxiZi
00000000000000f1 R_X86_64_PC32     .gcc_except_table+0x000000000000004c


RELOCATION RECORDS FOR [.data.DW.ref.__dmd_personality_v0]:
OFFSET           TYPE              VALUE 
0000000000000000 R_X86_64_64       __dmd_personality_v0


RELOCATION RECORDS FOR [.text.count]:
OFFSET           TYPE              VALUE 
000000000000000b R_X86_64_GOTPCREL  _D5count7counterOS5count21__T13AtomicCounterTiZ13AtomicCounter-0x0000000000000004
0000000000000010 R_X86_64_PLT32    _D5count21__T13AtomicCounterTiZ13AtomicCounter8getValueMOxFNaNbNiZi-0x0000000000000004
0000000000000030 R_X86_64_GOTPCREL  _D5count7counterOS5count21__T13AtomicCounterTiZ13AtomicCounter-0x0000000000000004
0000000000000035 R_X86_64_PLT32    _D5count21__T13AtomicCounterTiZ13AtomicCounter6addOneMOFNaNbNiZv-0x0000000000000004
0000000000000044 R_X86_64_PLT32    _Unwind_Resume-0x0000000000000004


RELOCATION RECORDS FOR [.data._D11TypeInfo_Oi6__initZ]:
OFFSET           TYPE              VALUE 
0000000000000000 R_X86_64_64       _D15TypeInfo_Shared6__vtblZ
0000000000000010 R_X86_64_64       _D10TypeInfo_i6__initZ


RELOCATION RECORDS FOR [.data._D54TypeInfo_S5count21__T13AtomicCounterTiZ13AtomicCounter6__initZ]:
OFFSET           TYPE              VALUE 
0000000000000000 R_X86_64_64       _D15TypeInfo_Struct6__vtblZ
0000000000000018 R_X86_64_64       _D54TypeInfo_S5count21__T13AtomicCounterTiZ13AtomicCounter6__initZ+0x0000000000000088
0000000000000070 R_X86_64_64       _D11TypeInfo_Oi6__initZ


RELOCATION RECORDS FOR [.text._D5count21__T13AtomicCounterTiZ13AtomicCounter6addOneMOFNaNbNiZv]:
OFFSET           TYPE              VALUE 
000000000000000d R_X86_64_PLT32    _D4core6atomic28__T8atomicOpVAyaa2_2b3dTiTiZ8atomicOpFNaNbNiKOiiZi-0x0000000000000004


RELOCATION RECORDS FOR [.text._D5count21__T13AtomicCounterTiZ13AtomicCounter8getValueMOxFNaNbNiZi]:
OFFSET           TYPE              VALUE 
0000000000000005 R_X86_64_PLT32    _D4core6atomic47__T10atomicLoadVE4core6atomic11MemoryOrderi3TiZ10atomicLoadFNaNbNiKOxiZi-0x0000000000000004


RELOCATION RECORDS FOR [.text._D4core6atomic28__T8atomicOpVAyaa2_2b3dTiTiZ8atomicOpFNaNbNiKOiiZi]:
OFFSET           TYPE              VALUE 
000000000000000c R_X86_64_PLT32    _D4core6atomic24__T14atomicFetchAddTiTiZ14atomicFetchAddFNaNbNiKOiiZi-0x0000000000000004


RELOCATION RECORDS FOR [.text.d_dso_init]:
OFFSET           TYPE              VALUE 
0000000000000007 R_X86_64_PC32     __stop_deh-0x0000000000000004
000000000000000f R_X86_64_PC32     __start_deh-0x0000000000000004
0000000000000017 R_X86_64_PC32     __stop_minfo-0x0000000000000004
000000000000001f R_X86_64_PC32     __start_minfo-0x0000000000000004
0000000000000027 R_X86_64_PC32     .data.d_dso_rec-0x0000000000000004
0000000000000032 R_X86_64_PLT32    _d_dso_registry-0x0000000000000004


RELOCATION RECORDS FOR [.dtors.d_dso_dtor]:
OFFSET           TYPE              VALUE 
0000000000000000 R_X86_64_64       .text.d_dso_init


RELOCATION RECORDS FOR [.ctors.d_dso_ctor]:
OFFSET           TYPE              VALUE 
0000000000000000 R_X86_64_64       .text.d_dso_init

Each relocation record lists symbols needed for a section (a named chunk) of the binary object file. So, _d_dso_registry that the linker complained about is needed by the section .text.d_dso_init. It turns out this section is for handling dynamic loading/unloading of D code, and making sure module constructors/destructors are called. I can cut it out. Removing stuff is mostly safe because the linker will complain if we remove something we depend on. I say “mostly” because removing static constructor code will obviously break things that assume static constructors run on startup (the solution for now being to not assume that).

For completeness, here’s a list of all the sections:


$ objdump -h count.o

count.o:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000000  0000000000000000  0000000000000000  00000040  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .data         00000008  0000000000000000  0000000000000000  00000040  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, DATA
  2 .bss          00000000  0000000000000000  0000000000000000  00000050  2**4
                  ALLOC
  3 .rodata       00000000  0000000000000000  0000000000000000  00000050  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .comment      00000000  0000000000000000  0000000000000000  00000050  2**0
                  CONTENTS, READONLY
  5 .note         00000000  0000000000000000  0000000000000000  00000000  2**0
                  CONTENTS, READONLY
  6 .note.GNU-stack 00000000  0000000000000000  0000000000000000  00000000  2**0
                  CONTENTS, READONLY
  7 .data.rel.ro  00000000  0000000000000000  0000000000000000  00000050  2**4
                  CONTENTS, ALLOC, LOAD, DATA
  8 .gcc_except_table 00000058  0000000000000000  0000000000000000  00000050  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  9 .eh_frame     00000100  0000000000000000  0000000000000000  000000a8  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
 10 .data.DW.ref.__dmd_personality_v0 00000008  0000000000000000  0000000000000000  000001a8  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, DATA
 11 .text.count   00000060  0000000000000000  0000000000000000  000001b0  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
 12 .data._D11TypeInfo_Oi6__initZ 00000020  0000000000000000  0000000000000000  00000210  2**4
                  CONTENTS, ALLOC, LOAD, RELOC, DATA
 13 .data._D54TypeInfo_S5count21__T13AtomicCounterTiZ13AtomicCounter6__initZ 000000b0  0000000000000000  0000000000000000  00000230  2**4
                  CONTENTS, ALLOC, LOAD, RELOC, DATA
 14 .data._D5count21__T13AtomicCounterTiZ13AtomicCounter6__initZ 00000010  0000000000000000  0000000000000000  000002e0  2**4
                  CONTENTS, ALLOC, LOAD, DATA
 15 .text._D5count21__T13AtomicCounterTiZ13AtomicCounter6addOneMOFNaNbNiZv 00000018  0000000000000000  0000000000000000  000002f0  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
 16 .text._D5count21__T13AtomicCounterTiZ13AtomicCounter8getValueMOxFNaNbNiZi 00000010  0000000000000000  0000000000000000  00000308  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
 17 .text._D4core6atomic28__T8atomicOpVAyaa2_2b3dTiTiZ8atomicOpFNaNbNiKOiiZi 00000018  0000000000000000  0000000000000000  00000318  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
 18 .text._D4core6atomic24__T14atomicFetchAddTiTiZ14atomicFetchAddFNaNbNiKOiiZi 00000030  0000000000000000  0000000000000000  00000330  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 19 .text._D4core6atomic36__T28atomicValueIsProperlyAlignedTiZ28atomicValueIsProperlyAlignedFNaNbNiNfmZb 00000020  0000000000000000  0000000000000000  00000360  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 20 .text._D4core6atomic47__T10atomicLoadVE4core6atomic11MemoryOrderi3TiZ10atomicLoadFNaNbNiKOxiZi 00000020  0000000000000000  0000000000000000  00000380  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 21 deh           00000000  0000000000000000  0000000000000000  000003a0  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 22 minfo         00000000  0000000000000000  0000000000000000  000003a0  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 23 .group.d_dso  00000014  0000000000000000  0000000000000000  000003a0  2**0
                  CONTENTS, READONLY, EXCLUDE, GROUP, LINK_ONCE_DISCARD
 24 .data.d_dso_rec 00000008  0000000000000000  0000000000000000  000003b8  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 25 .text.d_dso_init 00000038  0000000000000000  0000000000000000  000003c0  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
 26 .dtors.d_dso_dtor 00000008  0000000000000000  0000000000000000  000003f8  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, DATA
 27 .ctors.d_dso_ctor 00000008  0000000000000000  0000000000000000  00000400  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, DATA

The other sections I’ll get rid of are .eh_frame (used for DWARF exception handling), minfo (D module info), deh (some D-specific exception handling stuff), .data.DW.ref.__dmd_personality_v0 (more DWARF exception handling), the static constructors/destructors, and anything to do with TypeInfo and DSO. After that, I’ll also need to clean up some unused symbols.


$ objcopy -R '.data.*[0-9]TypeInfo_*' -R '.[cd]tors.*' -R .text.d_dso_init -R .data.d_dso_rec -R minfo -R .eh_frame -R deh -R .data.DW.ref.__dmd_personality_v0 --strip-unneeded count.o
$ nm count.o
0000000000000000 W _D4core6atomic24__T14atomicFetchAddTiTiZ14atomicFetchAddFNaNbNiKOiiZi
0000000000000000 W _D4core6atomic28__T8atomicOpVAyaa2_2b3dTiTiZ8atomicOpFNaNbNiKOiiZi
0000000000000000 W _D4core6atomic36__T28atomicValueIsProperlyAlignedTiZ28atomicValueIsProperlyAlignedFNaNbNiNfmZb
0000000000000000 W _D4core6atomic47__T10atomicLoadVE4core6atomic11MemoryOrderi3TiZ10atomicLoadFNaNbNiKOxiZi
0000000000000000 V _D5count21__T13AtomicCounterTiZ13AtomicCounter6__initZ
0000000000000000 W _D5count21__T13AtomicCounterTiZ13AtomicCounter6addOneMOFNaNbNiZv
0000000000000000 W _D5count21__T13AtomicCounterTiZ13AtomicCounter8getValueMOxFNaNbNiZi
0000000000000000 D _D5count7counterOS5count21__T13AtomicCounterTiZ13AtomicCounter
                 U _Unwind_Resume
0000000000000000 T count

That’s much better. Now we can use the compiled D code just like a compiled C object file:


$ gcc -Wall program.c count.o
$ ./a.out
1
2
3
4
5
6
7
8
9
10