This is part 1 of possibly a 1-part series. We’ll see.
Has this ever happened to you? You have some code like this:
#include <string.h>
#include <stdio.h>
int main()
{
char buf[32] = {0};
strcpy(buf, "Hello world, this is a long string");
puts(buf);
}
You compile it, run it under a debugger like GDB (or ltrace
?) and set a breakpoint for the call to strcpy()
:
$ gcc test.c -o test
$ gdb ./test
GNU gdb (Debian N.NN-N+NN) N.NN
Reading symbols from ./test...(no debugging symbols found)...done.
(gdb) break strcpy
Function "strcpy" not defined.
Make breakpoint pending on future shared library load? (y or [n])
Why isn’t strcpy()
defined? We’re definitely calling it in that tiny program, right? So what gives? We double-check with nm
:
$ nm -D test
w __gmon_start__
U __libc_start_main
U puts
puts()
is there but strcpy()
is not! It turns out that GCC has built-in
implementations of many string functions. Emphasis mine:
GCC provides a large number of built-in functions other than the ones mentioned above. Some of these are for internal use in the processing of exceptions or variable-length argument lists and are not documented here because they may change from time to time; we do not recommend general use of these functions.
The remaining functions are provided for optimization purposes.
The generated code on 32-bit x86 (below) is pretty neat; the call to strcpy()
simply becomes a sequence of immediate-to-memory mov
s!
0x0804843a <+46>: lea eax,[esp+0x10]
0x0804843e <+50>: mov DWORD PTR [eax],0x6c6c6548
0x08048444 <+56>: mov DWORD PTR [eax+0x4],0x6f77206f
0x0804844b <+63>: mov DWORD PTR [eax+0x8],0x2c646c72
0x08048452 <+70>: mov DWORD PTR [eax+0xc],0x69687420
0x08048459 <+77>: mov DWORD PTR [eax+0x10],0x73692073
0x08048460 <+84>: mov DWORD PTR [eax+0x14],0x6c206120
0x08048467 <+91>: mov DWORD PTR [eax+0x18],0x20676e6f
0x0804846e <+98>: mov DWORD PTR [eax+0x1c],0x69727473
0x08048475 <+105>: mov WORD PTR [eax+0x20],0x676e
0x0804847b <+111>: mov BYTE PTR [eax+0x22],0x0
I suppose this will have a couple performance benefits:
- One less symbol for the dynamic linker (
ld.so
) to resolve at process startup - No calls, so no stack manipulations (GCC on 32-bit x86 uses “cdecl” calling convention, where all arguments are passed on the stack)
- The string is in the mov instructions so no TLB or cache misses for the source string!
It’s a little more convoluted on amd64. As far as I can tell from page 218 of
the AMD64 Architecture Programmer’s Manual, the only 64-bit immediate
mov
is to a register, not to memory!
The generated code on amd64 reflects this, using two moves (immediate to register, then register to memory) per each 8 bytes of the string:
0x000000000040050e <+40>: lea rax,[rbp-0x20]
0x0000000000400512 <+44>: movabs rdx,0x6f77206f6c6c6548
0x000000000040051c <+54>: mov QWORD PTR [rax],rdx
0x000000000040051f <+57>: movabs rcx,0x696874202c646c72
0x0000000000400529 <+67>: mov QWORD PTR [rax+0x8],rcx
0x000000000040052d <+71>: movabs rsi,0x6c20612073692073
0x0000000000400537 <+81>: mov QWORD PTR [rax+0x10],rsi
0x000000000040053b <+85>: movabs rdi,0x6972747320676e6f
0x0000000000400545 <+95>: mov QWORD PTR [rax+0x18],rdi
0x0000000000400549 <+99>: mov WORD PTR [rax+0x20],0x676e
0x000000000040054f <+105>: mov BYTE PTR [rax+0x22],0x0
There’s a code size overhead here: 3 bytes of instructions per every 4 bytes of
string. As this is a significant overhead, it’s disabled by gcc -Os
. If you
absolutely need to set a breakpoint on strcpy()
, strcat()
, or the other GCC
built-ins, compile with -fno-builtin
to turn this behaviour off.