Tearing apart printf()

Thom Holwerda 2019-11-12 General Development 11 Comments

If ‘Hello World’ is the first program for C students, then printf() is probably the first function. I’ve had to answer questions about printf() many times over the years, so I’ve finally set aside time for an informal writeup. The common questions fit roughly in to two forms:
Easy: How does printf mechanically solve the format problem?
Complex: How does printf actually display text on my console?
My usual answer? “Just open up stdio.h and track it down”
This wild goose chase is not only a great learning experience, but also an interesting test for the dedicated beginner. Will they come back with an answer? If so, how detailed is it? What IS a good answer?

This is incredibly detailed and definitely over my head, but I’m sure many of you will enjoy this one greatly.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

11 Comments

2019-11-13 3:31 am

Erez
Detailed is one word for it. Why does it that everyone think they have to start their articles as if you have just discovered computers? You have to scroll down halfway through the article before he stopped talking about everything other than “tearing apart” and start digging into some low level stuff. Why does he think that he must first explaine that a C function is something you type, compile, link and execute? One thing if you say “tearing down ls”, and explain that ls is a C program, but explaining what a C function is? Who is your target audience? Does he actually think anyone program in C, even a “hello world” without actually knowing those basics? “Step one, Include the C header” Is this a joke?

If anyone is interested, search for “Step 5 – Loader prepares the run-time”. Anything above that is pure, slightly insulting, drivel.
2019-11-13 3:53 am

sj87
The “usual answer” of telling to open up a C header file is actually, in general, really poor, because the header file isn’t guaranteed to contain 1) any useful code 2) any hints where the actual code is located on the disk. Therefore the students could only receive a big “fuck you” sign upon spending some time to locate the header file in the first place…
2019-11-13 11:42 am

malxau
The final line “Last (and definitely least): Untangling the mess inside vfprintf” is what I assumed this whole article was about. It spent the whole time building up to something and didn’t get to its destination.

Fwiw, I wrote one of these and I’d encourage anyone who wants to understand it to do the same. varargs is a bit goofy, and has some strange quirks like arguments always get pushed onto the stack as a multiple of machine word size, so printf has to pop off a machine word even when printing a single char, etc.

One (of the many) unfortunate things about the whole printf design is because the format string is evaluated at runtime the linker ends up having to throw in all the code to format every format type even if the program doesn’t use them. I was actually impressed reading the article that gcc is smart enough to remove printf calls when no format specifiers are present – that’s pretty cute, although confusing as all get out, clearly blurring the architectural line between the compiler and libraries.

2019-11-13 4:53 pm

Wootery
> clearly blurring the architectural line between the compiler and libraries.

Indeed. Not the only such crossover. I think GCC provides printf even if you omit the #include directive, which I suppose is a violation of the C standard.

If I understand correctly, the memcpy and memmove functions are ‘magic’ in C regarding C’s ‘effective type’ rules. [0] Also, in the case of memcpy, an optimising compiler may do the opposite of what you describe with printf: it may synthesise an invocation of memcpy rather than compile your loop the usual way (under the assumption that memcpy is better optimised).

An optimising compiler might also be ‘aware’ of the semantics of concurrency primitives in the language’s standard library.

Another example might be type-traits in the C++ standard library, which, I believe, cannot be implemented in pure C++.

[0] https://en.cppreference.com/w/c/language/object#Effective_type

2019-11-14 12:29 am

malxau
> Also, in the case of memcpy, an optimising compiler may do the opposite of what you describe with printf: it may synthesise an invocation of memcpy rather than compile your loop the usual way (under the assumption that memcpy is better optimised).

Yes – when I was writing my C library the compiler replaced my implementation of memcpy with a call to memcpy, which obviously didn’t work too well ;). I was targeting Visual C++ for it, and there’s an /Oi switch that controls this behavior. It also replaces things like static initializers that zero buffers with calls to memcpy.

The other thing they do is replace what appears to be arithmetic operations with library calls. For Visual C++, the 32 bit compiler pretends it can do 64 bit math, but it only natively does addition and subtraction, and passes multiplication, division and shifts to the library. These library functions don’t even use a standard calling convention; they place parameters in specific registers which are architecture optimized, so they can only really be implemented in assembly.

2019-11-14 3:34 am

Alfman verbose=1
malxau,

Yes – when I was writing my C library the compiler replaced my implementation of memcpy with a call to memcpy, which obviously didn’t work too well ;). I was targeting Visual C++ for it, and there’s an /Oi switch that controls this behavior. It also replaces things like static initializers that zero buffers with calls to memcpy.

Compilers can do the funniest things, haha.

The other thing they do is replace what appears to be arithmetic operations with library calls. For Visual C++, the 32 bit compiler pretends it can do 64 bit math, but it only natively does addition and subtraction, and passes multiplication, division and shifts to the library. These library functions don’t even use a standard calling convention; they place parameters in specific registers which are architecture optimized, so they can only really be implemented in assembly.

I’ve seen that as well when I was dealing with my arbitrary precision math library. Hmm, I’m not sure why they didn’t have a native inline implementation for shifts? Unless they just didn’t get around to optimizing it, there’s no need for that to be translated into a function call, that’s disappointing.

I had to use inline assembly in my library. The C/C++ have a shortcoming for multi-word math because they don’t expose cpu flags. I fought the C compiler quite a bit to coerce it to translate the C code into more optimal assembly code (I didn’t want to have a non-portable assembly dependency), but some cases I failed to get the C version to match the assembly version’s performance.

2019-11-14 6:32 pm

acobar
OK, the last time I had to resort to x86 asm was a long, long time ago, but I remember that there were cases where what you may think were functions calls were macros. It is actually a common misconception people used to have when asking the compiler to generate assembly code.

As a simple test I would generate the object file and then use a disassembler to see what actually is generated.
2019-11-15 12:28 am

Alfman verbose=1
acobar

OK, the last time I had to resort to x86 asm was a long, long time ago, but I remember that there were cases where what you may think were functions calls were macros. It is actually a common misconception people used to have when asking the compiler to generate assembly code.

As a simple test I would generate the object file and then use a disassembler to see what actually is generated.

We are talking about the generated assembly, I have not independently confirmed malxau’s results on visual studio, although I don’t have a reason to doubt him. Here are my results under 32bit GCC both with and without optimization (no difference), comments show relevant assembly output.

#include
#include

int main() {
uint64_t a,b;

scanf(“%lld %lld”, &a, &b);

// a= b=
printf(“a=%lld b=%lld\n”, a, b);

// a/b gets converted into a function call, which is expected
printf(“a/b=%lld\n”, a/b);
// call __udivdi3@PLT

// a+b is inlined, which is perfect
printf(“a/b=%lld\n”, a+b);
// add eax, esi
// adc edx, edi

// a*b is inlined by GCC by doing mul repeatedly
printf(“a/b=%lld\n”, a*b);
// mov ecx, edi
// imul ecx, eax
// mov DWORD PTR -44[ebp], ecx
// mov ecx, edx
// imul ecx, esi
// add ecx, DWORD PTR -44[ebp]
// mul esi
// add ecx, edx

//a>>1 is inlined
printf(“a>>1=%lld\n”, a>>1);
//shrd eax, edx, 1
// shr edx

//a>>b is inlined
printf(“a>>b=%lld\n”, a>>b);
//shrd eax, edx, cl
//shr edx, cl
}

Under GCC all of the test cases got inlined except div, which I consider fair. A 64bit mul function could have made sense too, but it looks like the GCC guys considered it short enough to inline. Having 64bit shift operations call a function in visual studio is lazy though considering how easy it is to inline, haha.

For better or worse, in my library I was unable to get optimal division results under C without using assembly because I could not get GCC to issue a naked div instruction (such as “DIV EBX” = EDX:EAX / EBX).
Ironically optimizing the C code was a bit tricky because C lacks CPU arithmetic flags for overflow/carry. However as I recall, I was able to express these as (unwanted) double-word operations that GCC was able to optimize away with typecasting (aka bitmasking) in certain spots. Back then people here on osnews were skeptical that hand tuned assembly could outperform GCC, haha. Who knows, things may have gotten better since then.

I actually miss working on assembly algorithms, it’s a struggle to keep interested in the website work I do for a living.
2019-11-15 1:32 am

acobar
I actually miss working on assembly algorithms, it’s a struggle to keep interested in the website work I do for a living.

Tell me about be bored ! I used, since early in education, to figure out algebraic solutions for many problems and independently “rediscovered” many famous of them. Now, I use packages or follow standardized rules to calculate things that are like cook receipts, with the disadvantage that I can’t eat the “things” in the end. Boooring !!

2019-11-13 11:42 am

teco.sb
This article doesn’t really say much about how printf() works. A better title would be how any C function in the standard library is accessed by the program. As someone else said, he spends too much time on trivialities, then goes on to explain how to the loader works, and what other functions are called by using strace.

The printf() family of functions are actually fairly straight forward. I’ve implemented a simplified version (without %e/%f/%g) and ended up learning quite a bit from that experience. I had hoped this article was going to go into how to handle the different specifier, flags, precision, etc, because I thought all that was very interesting. But instead, he only looked at the chain of functions called. The thing is that, what he described, only works for GLIBC on Linux. Using any other c library (like musl, for example) will yield different results, as will running strace on a different operating system.
2019-11-14 11:08 am

FunkyELF
I remember digging down deep into fprintf once. If I remember correctly it wasn’t defined as a C function but involved a bunch of macros.
We had a system that would output files in a single system of units, some applications were metric, some imperial.
We wanted to provide a convenience library which would output them in both. This would minimize source modifications. To achieve this I believe our version took a separate file pointer and even introduced our own specifiers.

It was definitely a learning experience, and quite possibly the wrong thing to do