This is part 1 of possibly a 1-part series. We'll see.
Has this ever happened to you? You have some code like this:
You compile it, run it under a debugger like GDB (or
ltrace?) and set a breakpoint for the call to
strcpy() defined? We're definitely calling it in that tiny program, right? So what gives? We double-check with
puts() is there but
strcpy() is not! It turns out that GCC has built-in
implementations of many string functions. Emphasis mine:
GCC provides a large number of built-in functions other than the ones mentioned above. Some of these are for internal use in the processing of exceptions or variable-length argument lists and are not documented here because they may change from time to time; we do not recommend general use of these functions.
The remaining functions are provided for optimization purposes.
The generated code on 32-bit x86 (below) is pretty neat; the call to
simply becomes a sequence of immediate-to-memory
I suppose this will have a couple performance benefits:
- One less symbol for the dynamic linker (
ld.so) to resolve at process startup
- No calls, so no stack manipulations (GCC on 32-bit x86 uses "cdecl" calling convention, where all arguments are passed on the stack)
- The string is in the mov instructions so no TLB or cache misses for the source string!
It's a little more convoluted on amd64. As far as I can tell from page 218 of
the AMD64 Architecture Programmer's Manual, the only 64-bit immediate
mov is to a register, not to memory!
The generated code on amd64 reflects this, using two moves (immediate to register, then register to memory) per each 8 bytes of the string:
There's a code size overhead here: 3 bytes of instructions per every 4 bytes of
string. As this is a significant overhead, it's disabled by
gcc -Os. If you
absolutely need to set a breakpoint on
strcat(), or the other GCC
built-ins, compile with
-fno-builtin to turn this behaviour off.