Update: a lot of this information is already outdated (good news!). See my latest update, and my second update.
Most high level languages are built as a layer on top of C. That includes out-of-the-box D, but it doesn’t have to
be that way: D is a plausible candidate for a “better C”. I think this is a pretty cool idea, so I’ve been
experimenting with it to see what’s possible. The dmd
compiler (and very soon the ldc2 compiler) has a -betterC command line flag that’s intended to remove dependencies on the
D runtime. Unfortunately, it’s still extremely rudimentary — the docs only promise it “omit[s] generating some
runtime information and helper functions” — so in practice it’s hard to write non-trivial D code without getting
runtime dependencies, even if you don’t need them in theory.
With a little linker hacking, it’s possible to rip these unnecessary dependencies out of compiled D code. As an
example, I’ll completely remove all references to the D runtime out of some compiled D code so that it can link
directly to some C, as if it were C code to begin with.
Disclaimers
I consider this an experimental hack until there’s more official compiler support. I just hope it helps more D
programmers experiment with the better C concept, so that we can develop an even better better C :)
Also, the D runtime appears in recurring flamewars about D, so I’ll have to say it: I think the runtime is okay for
most applications. It just might be necessary to remove it when doing certain types of systems programming. Even then,
there are useful compromises between the extremes of “no D runtime” and “full D runtime”.
And if you’ve found this page after searching for a linker error you got compiling some normal D code, sorry, this
probably won’t be your solution. I recommend asking on the Dlang
forums instead.
Finally, I’m doing this with dmd, on a 64b GNU/Linux
system with PIC and stack canaries. Other systems will be similar because the D ABI is reasonably well specced out
(especially compared to C++’s ABI) but some things might still not be portable.
What’s Lost
I’m ripping the runtime out hard, here, so I’ll lose a number of D features. PowerNex, a kernel written in D, ports a subset of the D runtime to preserve
some functionality, but I won’t do that at all, to get a kind of baseline.
GC is out, of course. This affects some features like dynamic array concatenation and closures.
D classes just won’t work without replacing at least some of object.d in the runtime library. That doesn’t bother me much because I
don’t rely on classes for programming. (Apparently C++ classes are an alternative.)
D’s runtime type information is based on TypeInfo classes,
so that has to go, too. If you’re doing a “better C” coding style, you’re probably not going to miss that, either.
Unfortunately, for legacy reasons, the runtime itself is a heavy user of TypeInfo, so the compiler will inject Typeinfo dependencies into code. For example, array comparison is
implemented using TypeInfo-based reflection, even when the
elements are plain old data and a simple memcmp is enough.
This isn’t hard to work around by removing the TypeInfo-based
implementations and reimplementing things as needed, but it’s a nuisance. Hopefully this situation should improve
relatively quickly because there are performance benefits even for code that isn’t -betterC.
Exceptions are classes and also use TypeInfo, so they’re
out. Even if you port enough object.d code to support
exceptions, the idiomatic usage of exceptions requires GC allocation. Auburn Sounds have documented a workaround. Many D
developers are already interested in implementing GC-less exceptions after fixes to @safe, scope
and reference counting are finished.
Standard D assertions don’t work out of the box. On the other hand, they’re automatically removed from release
builds, so I’ve taken up the pattern of linking my test code to the D runtime and building my runtime-less code with
-release.
I haven’t experimented with thread-local storage, but I’m okay with making all global data immutable or shared, anyway.
Initialisers (for modules and static data) that normally run before D’s main won’t run. I might have a try getting them to run sometime, but for
now they’re out. All data will need to be either purely constructed at compile time, or explicitly constructed at
runtime (or left with the default zeroed value).
The Phobos standard library is a bit tricky. Some of it usable, but a lot isn’t. Exceptions are a major blocker.
In case this all sounds too depressing, here are some things that we still have compared to C, even with 100% of the
D runtime removed:
A better, stronger type system. C’s type system has a few dark corners, especially around pointers and arrays. C’s
enumerated type values fill a global namespace and have no type safety.
Slices. These still work without GC (except for things like concatenation), and they’re a much less error-prone way
to handle chunks of memory than plain pointers.
Simple delegates. (It’s only full-featured closures that don’t work.)
Compile-time reflection and metaprogramming.
(Surprisingly) more low-level control without vendor-specific pragmas. (Standard D offers ways to specify data
alignment, for example.)
Modules.
Doing the Surgery
Here’s some horrible, over-engineered sample code. It has one public function, count(), which returns a number that goes up by one every time it’s
called. That’s the only thing that’s extern(C); everything
else is used internally and is normal D. The unit test doesn’t work when linked to bare C, but there’s nothing stopping
us putting it in the code anyway and running it in a test build.
Here’s some simple C code that’ll use this awesome functionality to count to 10:
First, let’s compile the D code to an object file, and then try naïvely linking it with the C code:
Okay, that’s a bunch of errors from the linker trying to find things from the D runtime. The usual way to fix this
would be to do the compilation the other way around (compile C code to object files with gcc first, then let dmd put everything together and link in the D runtime). Of course, I’m
not going to do that because I don’t want the runtime. Let’s take a closer look at what linker symbols are inside
count.o:
The symbols marked U are things that are missing and need
to be pulled in externally at link time. The global offset table is for PIC and is recognised by gcc, and _Unwind_Resume is also recognised by gcc, but the other things are from the D runtime, and we need to get rid
of these dependencies. (Web search engines and the D runtime source code are good for identifying these symbols. I also
found this list of runtime functions recently, too.) We can’t just
remove the symbols, of course, we need to remove the things that depend on those symbols — i.e., the
relocations. Let’s take a look at them:
Each relocation record lists symbols needed for a section (a named chunk) of the binary object file. So,
_d_dso_registry that the linker complained about is needed by
the section .text.d_dso_init. It turns out this section is
for handling dynamic loading/unloading of D code, and making sure module constructors/destructors are called. I can cut
it out. Removing stuff is mostly safe because the linker will complain if we remove something we depend on. I say
“mostly” because removing static constructor code will obviously break things that assume static constructors run on
startup (the solution for now being to not assume that).
For completeness, here’s a list of all the sections:
The other sections I’ll get rid of are .eh_frame (used for
DWARF exception handling), minfo (D module info),
deh (some D-specific exception handling stuff), .data.DW.ref.__dmd_personality_v0 (more DWARF exception handling), the
static constructors/destructors, and anything to do with TypeInfo and DSO. After that, I’ll also need to clean up some unused
symbols.
That’s much better. Now we can use the compiled D code just like a compiled C object file: