Chapter 5. Determining Interesting Functions
Clearly without source code, we can't possibly hope to understand all of sections of an entire program. So we have to use various methods and guess work to narrow down our search to a couple of key functions.
Reconstructing function & control information
The problem is that first, we must determine what portions of the code are actually functions. This can be difficult without debugging symbols. Fortunately, there are a couple of utilities that make our lives easier.
Objdump's most useful purpose is to disassemble a program with the -d switch. Lacking symbols, this output is a bit more cryptic. The -j option is used to specify a segment to disassemble. Most likely we will want .text, which is where all the program code lies.
Note that the leftmost column of objdump contains a hex number. This is in fact the actual address in memory where that instruction is located. Its binary value is given in the next column, followed by its mnemonic.
objdump -T will give us a listing of all library functions this program calls.
Steve Barker wrote a neat little perl script that makes objdump much more legible in the event that symbols are not included. The script has since been extended and improved by myself and Nasko Oskov. It now makes 3 passes through the output. The first pass builds a symbol table of called and jumped-to locations. The second pass finds areas between two rets, and inserts them into the symbol table as "unused" functions. The third pass prints out the nicely labeled output, and prints out a function call tree. Usage:
./disasm /path/to/binary > binary.asminfo
There are/will be few command line options to the utility. Now --graph is supported. It will generate a file called call_graph that contains definition that can be used with a program called dot to generate visual representation of the call graph.
Note: Unused functions just mean that that function wasn't called DIRECTLY. It is still possible that a function was called through a function pointer (ie, main is called this way)
Ok, so now we're getting ready to get really down and dirty. The first step to finding what you are looking for is to know what you are looking for. Which functions are 'interesting' is entirely dependent on your point of view. Are you looking for copy protection? How do you suspect it is done. When in the program execution does it show up? Are you looking to do a security audit of the program? Is there any sloppy string usage? Which functions use strcmp, sprintf, etc? Which use malloc? Is there a possibility of improper memory allocation?
If we can narrow down our search to just a few functions that are relevant to our objective, our lives should be much easier.
Regardless of our objective, it is almost always helpful to know where main() lies. Unfortunately, when debugging symbols are removed, this is not always easy.
In Linux, program execution actually begins at the location defined by the _start symbol, which is provided by gcc in the crt0 libraries (check gcc -v for location). Execution then continues to __libc_start_main(), which calls _init() for each library in the program space. Each _init() then calls any global constructors you may have in that particular library. Global constructors can be created by making global instances of C++ classes with a constructor, or by specifying __attribute__((constructor)) after a function prototype. After this, execution is finally transferred to main through an indirect call off of the base register ebp.
The easiest technique is to try to use our friends ltrace and gdb (FIXME: the debugging chapter has been moved to after this one..) together with our disassembled output. Checking the return address of the first few functions of ltrace -i, and cross referencing that to our assembly output and function call tree should give us a pretty good idea where main is. We may have to try to trick the program into exiting early, or printout out an error message before it gets too deep into its call stack.
Other techniques exist. For example, we can LD_PRELOAD a .c file with a constructor function in it. We can then set a breakpoint to a libc function that it calls that is also in the main executable, and finish and stepi until we are satisfied that we have found main.
Even better, we could just set a breakpoint in the function __libc_start_main (which is a libc function, and thus we will always have a symbol for it), and do the same technique of finishing and stepping until we reach what looks like main to us.
At worst, even without a frame pointer, we should be able to get the address of a function early enough in the execution chain for us to consider it to be main.
Finding other interesting functions
Its probably a good idea to make a list of all functions that call exit. These may be of use to us. Other techniques for tracking down interesting functions include:
-
Checking for which functions call obscure gui construction widgets used in a dialog box asking for a product serial number
-
Checking the string references to find out which functions reference strings that we are interested in. For example, if a program outputs the text "Already registered." knowing what function outputs this string is helpful in figuring out the protection this particular program uses.
-
Running a program in gdb, then hitting control C when it begins to perform some interesting operation. using stepi N should slow things down and allow you to be more accurate. Sometimes this is too slow however. Find a commonly called function, set a breakpoint, and try doing cont N.
-
Checking which functions call functions in the BSD socket layer
Plotting out program flow
Plot out execution paths into a tree from main, especially to your function(s) of interest. You can use disasm.pl to generate call graphs with the --graph option. Using it enables the script to generate file called call_graph. It contains definition of the call graph in a format used by a popular graphing tool called dot. Feeding this definition file in dot will give you a nice (probably pretty huge) graphics file with visual representation of the call graph. It is pretty amazing. Definitely try it with some small program.
Further analysis will have to hold off until we understand some assembly.
Chapter 6. Understanding Assembly
Since the output of all of these tools is in AT&T syntax, those of you who know Intel/MASM syntax have a bit of re-learning to do.
Assembly language is one step closer to the hardware than high level languages like C and C++. So to understand assembly, you have to understand how the hardware works. Lets start with a set of memory locations known as the CPU registers.
Registers are like the local variables of the CPU, except there are a fixed number of them. For the ix86 CPU, there are only 4 main registers for doing integer calculations: A, B, C, and D. Each of these 4 registers can be accessed 4 different ways: as a 32 bit value (%eax), as a 16 bit value (%ax), and as a low and a high 8 bit value (%al and %ah). There are five more registers that you will see used occasionally - namely SI, DI, SP and BP. SI and DI are around from the DOS days when people used 64k segmented addressing, and as it turns out, may be used as integer like normal registers now. SP and BP are two special registers used to handle an area of memory called the stack. There is one last register, the instruction pointer IP that you may not modify directly, but is changed through jmps and calls. Its value is the address of the next instruction to execute. (FIXME: Check this)
Note: If gcc was called with the -fomit-frame-pointer, the BP register is freed up to be used as an extra integer register.
A stack is what is called a Last In, First Out data structure or LIFO. Think of it as a stack of plates. The most recent (last) plate pushed on top of the stack is the first one to be removed. This allows us to manage the stack with only one register if need be, namely the stack pointer or SP register.
The stack is a region of memory that is present throughout the entire lifetime of a program. It is where local variables are stored, and it is also how function call arguments are passed.
On almost all modern computers, the stack is said to grow down, that is, as elements are pushed on to it, the SP register is decremented by the size of the element pushed. From our earlier analogy, its as if the stack of plates where hung from the ceiling, new plates were inserted at the bottom, and the whole stack some sort of catch to stop them all from dumping out. That catch would be the SP register.
So the stack starts from a high memory address, and works down to a lower address. This is because another section of memory called the heap grows up, and its handy to have the two of them grow towards eachother to fill in a single empty hole in the program address space.
Note: It is easy to become confused when dealing with the stack. Remember that while it may grow down, variables are still addressed sequentially upwards. So an array of char b[4] at esp of 80 will have b[0] at 80 (right at the stack pointer), b[1] above that at 81, b[2] at 82, and b[3] at 83, which is where the stack pointer was before the push. The next push will then place the stack pointer at 76.
There are two instructions that deal with the stack directly: push and pop. Each take a register or value as an argument. Push will place its argument onto the stack, and then decrement the SP by the size of its argument (4 for pushl, 2 for pushw, 1 for pushb). //FIXME (What is pushl and push b) Pop copies the value on the top of the stack into its argument, then increments SP. Pusha and popa push and pop all the registers with one instruction. Because of speed considerations, the value is not touched, just the SP register is changed to point to the next location ot the stack. So SP is always pointing to the top value of the stack and not at invalid memory.
Normal arithmetic expressions can also be used to modify SP to make space for working directly with stack memory with other instructions.
How gcc works with the stack
Right before a function is called, its arguments are pushed onto the stack in reverse order. Then the call instruction pushes the address of the next instruction (ie the value of IP after call) onto the stack, and then the CPU begins executing the address of the call by copying that value into the invisible instruction pointer (IP) register.
The called function then starts with what is known as the function prolog, which pushes the current base pointer onto the stack, and then copies the current stack pointer to the base pointer, and then subtracts from SP enough space to hold all local variables (and then some!). The base pointer is then used to reference variables and parameters during function execution, since its value is not affected by pushes and pops. Thus, parameters all have fixed positive offsets from the BP, where as local variables all have fixed negative offsets from the BP.
At the end of function execution, the base pointer is copied to the stack pointer during ret, and the return address is popped off the stack and placed into the invisible IP register to return to the caller function.
Note: Unless -fomit-frame-pointer is specified, gcc always generates code that references local variables by negative offsets from the BP instead of positive offsets from the SP.
Two's complement is specific way signed integers are represented in pretty much all modern computers. This is due to the fact that two's complement form has several advantages:
-
The same rules for addition apply, no extra work is required to compute the sum of negative integers.
-
Easy to negate a number.
-
The most significant bit tells you the sign: 0 is positive, 1 is negative.
It should be noted that when using signed values the ranges of number that can be represented by a specific number of bits is less than the usual. The range is -(2n-1) to +(2n-1)-1
There are several ways to convert any unsigned binary number into signed two's complement form. The most intuitive and easy to remember is the following Complement each bit of the number and add one. Let's find how -13 is represented, so we convert it into its binary form:
0000 1101
Then invert all the bits.
1111 0010
Now add one to it.
1111 0011
So 1111 0011 is -13 in two's complement.
Second method is to complement all the bits to the left of the rightmost 1 bit, but not including it (but not the rightmost bit, for example 0001 0100). It sounds a bit complicated, but is easier once you figure out how it is done. Let's get back to the example of -13.
0000 1101
^
Invert the bits to the left of the rightmost one.
1111 0011
There you go. We get the number without second step of adding one. It can be proven why this method works, but we are not in class. Yet a third method is to subtract the number from 2n. Here is how it works.
1000 0000
-
0000 1101
---------
1111 0011
There may be other ways of doing it, but if you master those, you will not need to remember any more. To convert a negative number in two's complement, you apply the exact same procedure as described and you get back the positive value for the number.
From reverse engineering angle
Now that we know what two's complement is let's look at some examples of this type of representation in reverse engineering process. Using one of the tools discussed earlier, objdump and the wrapper disasm.pl, let's look at the ls command binary. If you look at function7 (which starts at address 80495a8), lines like the following appear frequently:
80495be: 83 c4 f8 add $0xfffffff8,%esp
What does this instruction do? It just adds some constant to the stack pointer register (%esp). There are two ways you can look at this constant. It is either a huge unsigned number or two's complement negative number. Since we just add to the stack pointer, it does not make sense to be big number, so let's find what is the value of this number.
f f f f f f f 8
1111 1111 1111 1111 1111 1111 1111 1000
0000 0000 0000 0000 0000 0000 0000 1000
0 0 0 0 0 0 0 8
Now we can see that this is just the negative of 0x00000008 or just plain -8 in decimal. If you think about this, what this line does is decrement the stack pointer by 8 bytes (allocate more space).
One common difficulty in working on multiple platforms is that different platforms use different byte orders. Byte ordering refers to the physical layout of integer data in memory. There are two different orderings - little endian and big endian. When a data structure or data type is represented by more than one byte, the ordering of bytes matters. For example if we consider a long (4 bytes) let's label the least significant byte 0 and the most significant one 3. If we are on little endian machine the long will be represented in memory like this (yeah, some machines do not allow addressable bytes, but let's forget about this):
0x040 0
0x041 1
0x042 2
0x043 3
On a big endian machine on the other hand, the long will be layed out like that:
0x040 3
0x041 2
0x042 1
0x043 0
Now let's look at an example. The easiest way to see the difference in byte ordering is to look at how a long is stored in memory on different architectures. Here is an example program that will demonstrate it.
#include <stdio.h>
int main() {
long longval = 123456789;
printf("%s\n", test);
}
After compiling it with debugging info, let's run it and see what will be the result. The first run is on Intel-based machine.
bash$ uname -a
Linux slack 2.4.20 #5 Tue Dec 31 00:01:00 CST 2002 i686 unknown
bash$ gdb ./a.out
GNU gdb 5.2.1
This GDB was configured as "i386-slackware-linux"...
(gdb) break main
Breakpoint 1 at 0x8048338: file test.c, line 5.
(gdb) run
Breakpoint 1, main () at test.c:5
5 long longval = 123456789;
(gdb) stepi
8 printf("value is %d\n", longval);
Let's get the address of longval in memory
(gdb) print &longval
$2 = (long int *) 0xbffff234
Let's print the contents of longval as a word
(gdb) x/w 0xbffff234
0xbffff234: 0x075bcd15
Let's print the contents of longval as 4 consecutive bytes
(gdb) x/4b 0xbffff234
0xbffff234: 0x15 0xcd 0x5b 0x07
(gdb) quit
The second run was on Sparc machine running Solaris.
remsun1> uname -a
SunOS remsun1 5.8 Generic_108528-16 sun4u sparc SUNW,Sun-Fire-280R
remsun1> gdb ./a.out
GNU gdb 4.18
This GDB was configured as "sparc-sun-solaris2.7"...
(gdb) break main
Breakpoint 1 at 0x10564: file test.c, line 5.
(gdb) run
Breakpoint 1, main () at test.c:5
5 long longval = 123456789;
(gdb) stepi
0x10568 5 long longval = 123456789;
Let's get the address of longval in memory
(gdb) print &longval
$1 = (long int *) 0xffbefaec
Let's print the contents of longval as a word
(gdb) x/1w 0xffbefaec
0xffbefaec: 0x075bcd15
Let's print the contents of longval as 4 consecutive bytes
(gdb) x/4b 0xffbefaec
0xffbefaec: 0x07 0x5b 0xcd 0x15
(gdb)
One can clearly see how on the Sparc machine the individual bytes are in the same order as in the printed word, whereas the Intel machine has it reverse.
This is the difference in byte ordering. In order for different hosts on the same network to be able to communicate and the exchanged data to make sense, they agree on common byte ordering. In modern networking the data is transmitted in big endian byte ordering i.e. most significant byte comes first. On the i80x86 the host byte order is Least Significant Byte first, whereas the network byte order, as used on the Internet, is Most Significant Byte first.
Keep track of the stack and registers
The secret to understanding assembly code is to always work with a sheet of paper and a pencil. When you first sit down, draw out a table for all 6 registers A, B, C, D, SI, and DI. Keep track of the high and low portions as well. Each new line of this table should represent a modification of a register, so the last value in each register column is the current value of that register.
Next, draw out a long column for the stack, and leave space on the sides to place the BP and SP registers as they move down. Be sure to write all values into the stack as they are placed there, including ret and the stored BP.
In AT&T syntax, all instructions are of the form:
mnemonic src, dest
Standalone numerical constants are prepended with a $. Hexadecimal numbers always start with 0x (as opposed to ending in h). Registers are specified with a % sign, ie %eax.
Dereferencing or pointer representation is of the form disp(%base, %index, scale), where the resulting address is disp + %base + %index*scale. disp and scale are constants (no $), and %base and %index are registers. Any of these 4 may be omitted, leaving either blank space and then a comma, or simply leaving off the argument, and all remaining arguments. For example, 4(%eax) means memory address 4+%eax, where as (,%eax,4) means %eax*4. This compact notation makes array indexing easy.
From here, it is simply a matter of understanding what each assembly mnemonic does. Most common mnemonics are obvious, but you can find a complete description of all the Intel instructions (in agonizing detail) at Intel's Developer Site. Volume 2 contains the instruction list. Keep in mind that in Intel syntax, operands are in the reverse order of AT&T syntax (ie, mnemonic dest,src).
In order to learn to read assembly effectively, you really have to know what type of code your compiler likes to generate in certain situations. If you learn to recognize what a while loop, a for loop, an if-else statement all look like in assembly, you can learn to get a general feel for code more quickly. There are also a few tricks that GCC performs that may seem unintuitive at first to the neophyte reverse engineer, even if they already know how to forward-engineer in assembly.
In assembly, the only flow control mechanisms are branching and calling. So every control structure is built up from a combination of goto's and conditional branches. Lets look at some specific examples.
So we've mentioned that function calls use the stack to pass arguments. But where does that leave return values? And what about local variables?
Local variables are also on the stack, just below the base pointer instead of above. But if you thought that a return value was a pop off of the stack, you were wrong! GCC places the return value of a particular function into the eax register at the end of that function. Upon calling a function with a return value, it knows to copy the eax register into whatever variable will store that return value.
So lets see some gcc output for function calls. Get your paper ready, we're going to need to draw our stack and register table to follow these. Yeah yeah, it seems like a hassle, and you're sure you can do without it. We know, we know. But humor us. If you at least practice the methodical way a few times, doing things in your head will become easier later.
gcc 2.95 Example .c file and gcc output with no optimization, with -O2, and with -O3 -fomit-frame-pointer To get the most out of these examples, start at main, and trace execution throughout the executable. Do the low optimization first, and then move up to higher levels. The comments assume you are progressing in that order. FIXME: We may want to split these out into several simpler example files, to avoid overwhelming people all at once.
gcc 3.3.2 The same files are also compiled with gcc version 3.3.2 and the corresponding files are functions.c, no optimization, -O3 -fomit-frame-pointer.
The if statement is represented in assembly as a test followed by a jump. The thing to notice is that sometimes the body of the if statement is what is jumped to, as opposed to being jumped over as your C code may specify. This means that the condition for the jump will often be the negation of the condition for your if statement.
gcc 2.95 Example .c file and gcc output with no optimization, with -O2, and with -O3 -fomit-frame-pointer
gcc 3.3.2 The same files are also compiled with gcc version 3.3.2 and the corresponding files are if.c, no optimization, -O3 -fomit-frame-pointer.
Complicated if statements
Of course, if statements can get much more complicated than the above examples. They can contain boolean short-circuits, function calls, nested-ifs, etc.
Arrays on the stack are just memory regions that we access with variations on the disp(%base, %index, scale) idea presented earlier. So lets start with a warm-up consisting of a simple char array where we let libc do all the work.
gcc 2.95 Example .c file and gcc output with no optimization, with -O2, and with -O3 -fomit-frame-pointer
gcc 3.3.2 array-stack-char.c, no optimization, -O3 -fomit-frame-pointer
So lets do another example where we do all the work. One dimensional arrays are the easiest, as they are simply a chunk of memory that is the number of elements times the size of each element.
gcc 2.95 Example .c file and gcc output with no optimization, with -O2, and with -O3 -fomit-frame-pointer
gcc 3.3.2 array-stack-int1D.c, no optimization, -O3 -fomit-frame-pointer
Two dimensional arrays are actually just an abstraction that makes working with memory easier in C. A 2D array on the stack is just one long 1D array that the C compiler divides for us to make it manageable. To parameterize things, an array declared as: type array[dim2][dim1]; is really a 1D array of length dim2*dim1*type. The C compiler handles array indexing as follows: array[i][j] is the memory location array + i*dim1*type + j*type. So it divides our 1D array into dim2 sections, each dim1*type long.
FIXME: Graphics to illustrate this.
gcc 2.95 Example .c file and gcc output with no optimization, with -O2, and with -O3 -fomit-frame-pointer
gcc 3.3.2 array-stack-int2D.c, no optimization, -O3 -fomit-frame-pointer
As I tell my introductory computer science students, the best way to think of higher dimensional arrays is to think of a set of arrays of the next lower dimension. So the best way to think about how a 3D array can be jammed into a 1D array is to think about how a set of 2D arrays would be jammed into a 1D array: one right after another. So for array declared as type array[dim3][dim2][dim1];, array[i][j][k] means array + i*dim2*dim1*type + j*dim1*type + k*type. So this means just by looking at the assembly multiplications of the indexing variables, we should be able to determine n-1 dimensions of any n dimensional array. The remaining dimension can be determined from the total size, or the bounds of some initialization loop.
FIXME: Diagram/graphics to show this
gcc 2.95 Example .c file and gcc output with no optimization, with -O2, and with -O3 -fomit-frame-pointer
gcc 3.3.2 array-stack-int3D.c, no optimization, -O3 -fomit-frame-pointer
Structures (structs) are a convenient way of managing related variables without having to write a class to encapsulate all of them. A structure is essentially a class without any member functions. Structures are used VERY often in C in order to avoid passing several variables back and forth between functions. Instead of passing all the variables, a common practice is to encapsulate all of them in a struct and just pass the location of the struct in memory to the function that needs access to those variables. Structures in C++ are declared like this:
struct a
{
int first;
float second;
char *third;
};
Don't forget that ; after the last brace. Structs can store any type of variable that you would normally be able to declare anywhere in your program. To access a variable in a struct you use the dot (.) operator. For example, to assign 5 to the variable first in the struct a, do
a.first = 5;
Arrays of structs are created just as you would create an array of any other variable. Using the declaration of a above, an array of a structs of size 10 would be declared like this:
struct a stuctarray[10];
Note the use of the struct keyword, followed by the name of the struct declared, followed by the name of the array.
The code above declares a static array of structs. This means that space will be allocated for this array during load time (FIXME: Check this). Struct arrays can also be declared as pointers so that space for individual elements can be allocated at run time as it is needed. (FIXME: Um...how is this done?...time to brush up on C).
GCC handles structs a bit oddly. When you have a function that returns a struct, what gcc does is actually push the address of the struct onto the stack just before calling the function (as if the first argument to the function was a pointer to the struct that will contain the return i value). Then, inside the function, code is generated to modify the struct through this address. At the end of the function, the value of %eax contains a pointer to the struct that was passed on to the stack. So instead of the normal convention of having %eax store the return value, %eax stores a pointer to the return value, and the return value is modified directly inside of the function.
gcc 2.95 Example .c file and gcc output with no optimization, with -O2, and with -O3 -fomit-frame-pointer
gcc 3.3.2 struct.c, no optimization, -O3 -fomit-frame-pointer
These examples were all compiled using GCC 2.95.4 under Debian 3.0/Testing. A good exercise would be to go compile some of these examples with GCC 3.0 under high optimizations, changing some things around and viewing the resulting asm to get a feel for that new compiler and how it does things, as code it generates will begin to become more ubiquitous as time goes on. It was still considered rather unstable as of this writing, so we opted for the older GCC for all these examples for that reason.
|