Alex's blog

posts feed about

Compilers are amazing #1: GCC's built-in strcpy() implementation

06 May 2016

This is part 1 of possibly a 1-part series. We’ll see.

Has this ever happened to you? You have some code like this:

#include <string.h>
#include <stdio.h>
int main()
{
  char buf[32] = {0};
  strcpy(buf, "Hello world, this is a long string");
  puts(buf);
}

You compile it, run it under a debugger like GDB (or ltrace?) and set a breakpoint for the call to strcpy():

$ gcc test.c -o test
$ gdb ./test 
GNU gdb (Debian N.NN-N+NN) N.NN
Reading symbols from ./test...(no debugging symbols found)...done.
(gdb) break strcpy
Function "strcpy" not defined.
Make breakpoint pending on future shared library load? (y or [n])

Why isn’t strcpy() defined? We’re definitely calling it in that tiny program, right? So what gives? We double-check with nm:

$ nm -D test
   w __gmon_start__
   U __libc_start_main
   U puts

puts() is there but strcpy() is not! It turns out that GCC has built-in implementations of many string functions. Emphasis mine:

GCC provides a large number of built-in functions other than the ones mentioned above. Some of these are for internal use in the processing of exceptions or variable-length argument lists and are not documented here because they may change from time to time; we do not recommend general use of these functions.

The remaining functions are provided for optimization purposes.

The generated code on 32-bit x86 (below) is pretty neat; the call to strcpy() simply becomes a sequence of immediate-to-memory movs!

0x0804843a <+46>:    lea    eax,[esp+0x10]
0x0804843e <+50>:    mov    DWORD PTR [eax],0x6c6c6548
0x08048444 <+56>:    mov    DWORD PTR [eax+0x4],0x6f77206f
0x0804844b <+63>:    mov    DWORD PTR [eax+0x8],0x2c646c72
0x08048452 <+70>:    mov    DWORD PTR [eax+0xc],0x69687420
0x08048459 <+77>:    mov    DWORD PTR [eax+0x10],0x73692073
0x08048460 <+84>:    mov    DWORD PTR [eax+0x14],0x6c206120
0x08048467 <+91>:    mov    DWORD PTR [eax+0x18],0x20676e6f
0x0804846e <+98>:    mov    DWORD PTR [eax+0x1c],0x69727473
0x08048475 <+105>:   mov    WORD PTR [eax+0x20],0x676e
0x0804847b <+111>:   mov    BYTE PTR [eax+0x22],0x0

I suppose this will have a couple performance benefits:

It’s a little more convoluted on amd64. As far as I can tell from page 218 of the AMD64 Architecture Programmer’s Manual, the only 64-bit immediate mov is to a register, not to memory!

Snippet of AMD64 Architecture Programmer's Manual rev 3.22

The generated code on amd64 reflects this, using two moves (immediate to register, then register to memory) per each 8 bytes of the string:

0x000000000040050e <+40>:    lea    rax,[rbp-0x20]
0x0000000000400512 <+44>:    movabs rdx,0x6f77206f6c6c6548
0x000000000040051c <+54>:    mov    QWORD PTR [rax],rdx
0x000000000040051f <+57>:    movabs rcx,0x696874202c646c72
0x0000000000400529 <+67>:    mov    QWORD PTR [rax+0x8],rcx
0x000000000040052d <+71>:    movabs rsi,0x6c20612073692073
0x0000000000400537 <+81>:    mov    QWORD PTR [rax+0x10],rsi
0x000000000040053b <+85>:    movabs rdi,0x6972747320676e6f
0x0000000000400545 <+95>:    mov    QWORD PTR [rax+0x18],rdi
0x0000000000400549 <+99>:    mov    WORD PTR [rax+0x20],0x676e
0x000000000040054f <+105>:   mov    BYTE PTR [rax+0x22],0x0

There’s a code size overhead here: 3 bytes of instructions per every 4 bytes of string. As this is a significant overhead, it’s disabled by gcc -Os. If you absolutely need to set a breakpoint on strcpy(), strcat(), or the other GCC built-ins, compile with -fno-builtin to turn this behaviour off.