Memory Corruption using Format Specifiers

Format string vulnerabilities are a vulnerability that affects various programming languages in distinct ways, because each language tends to implement string formatting differently. This post will mainly cover format string vulnerabilities targeting the C programming language.

So what exactly is string formatting?

Most programming languages implement some way to embed variables into strings. In a nutshell, string formatting is just a way for the language to dynamically create strings using variable values. Here's a simple example of a C program that uses a string format specifier:


  #include <stdio.h>

  int main() {
      char greeting[] = "Hello";
      printf("%s, World!\n", greeting);
  }
                    

This program creates a variable named "greeting" that contains the value "Hello, World!".

"Printf" is actually just a wrapper around multiple function calls, the 'f' standing for "format". The string passed as the first argument to printf will then be passed to a format function, which will evaluate the other arguments passed to printf and manipulate the string accordingly.

When run, the output will be "Hello, World", because the '%s' in the first argument specifies a string replacement and will be replaced with the string provided in the second argument -- the 'greeting' variable.

This is all well and good, but printf can also be called without any second argument provided, so what if a programmer were to initiate a printf call where the string contains a format specifier, but no second argument?

C is not a "memory safe" language, and if a negligent developer calls a function like printf with fewer arguments supplied than format specifiers in the initial string, it is possible to read values from the stack, or write to arbitrary locations in memory.

To understand how this is possible, we need to look into how the printf function actually works.

When a function is called the arguments supplied to the function are stored in either registers or the stack (depending on the calling convention). Format specifiers read values from the stack (expecting them to be the values placed on the stack when the function was called), so calling printf with a format specifier like '%s' when no second argument is provided will cause the program to try and read arbitrary stack data as a string. Using certain format specifiers, it is possible to leak stack data this way, which can pose obvious security concerns if sensitive data is stored on the stack.

Even more troublesome, what if the first argument passed to printf were a user-supplied string??

A malicious user could type a bunch of format specifiers into a user-supplied string that is passed to the printf function to abuse this vulnerability and leak stack data intentionally (and again, there doesn't need to be no provided second argument. The amount of format specifiers provided just needs to be greater than the amount of arguments provided after the string to start leaking from memory).

Here's an example:


  #include <stdio.h>

  int main() {
      printf("%x - %x - %x - %x - %x - %x");
  }
                    

Compile with gcc using 'gcc formatstr.c -o formatstr', and when run, this will print the first 6 values read from the stack.

Earlier I mentioned it was possible to write data to memory using format specifiers. This is because of the '%n' specifier, which writes the size of the input provided before the %n specifier to a specific address in memory.

If we were to supply the input "TESTING%n" to a printf statement, it would write 7 (the length of 'TESTING') to the memory address 0x70 (0x70 because no other address is specified in our payload yet). To specify an address to write to, we can use the syntax "%<num>$n", which will instead write to the address specified by '<num>'. We can combine this with another technique called padding to craft malicious payloads and write them to specific addresses. Padding can be done with "%<num>x" (the specific format specifier doesn't really matter, we're just using 'x' here), and by doing so will inflate the size of our input so the '%n' write is able to write larger values to the specified address.

Without the use of padding, we can only write numbers equal to the maximum length of the string buffer a user can provide. Most user-supplied input buffers in C are capped to avoid buffer overflows, so padding allows us to work around this caveat.

Because format specifiers are implemented differently for different programming languages, this allows us to perform different types of attacks using format strings than may be possible with C. One example of such an attack might be a denial of service against Java programs. In Java, string variable types are stored in memory on the heap and can be dynamically manipulated at runtime (unlike with C, where 'strings' are a fixed size buffer of characters), so we could potentially trigger a heap overflow using a large amount of padding as our payload.

For a write up discussing the crafting of string format payloads in more detail, I suggest checking out this article. The paper provides a bit more of a practical explanation on how to abuse these bugs in the wild.

Back to Home