Monday 20 July 2015

Concept of function in C - Hacker section!

In previous tutorial, Concept of function in C - Programmer section we have discussed the concept of function in C. For programmer that was enough but for hacker,this is not end! A programmer is one who only concerned with source code.A hacker is someone who is curious about how something(like computer) works.A hacker realizes that the compiled program is what actually gets executed out in the real world.This is the reason hackers have yet to learn.How function works in high level language C, we have discussed so far. Now we will see what is happening behind the scenes. I mean how function calls are represented in memory.

prerequisite : Concept of function in C - Programmer section!
                     Basic Assembly Programming Survival Skills for Hackers!
                     Essential GDB debugger skills for hacker to analyze the code !

There are two major syntax-es for x86 assembly: Intel and AT&T. AT&T syntax is used by Linux and The Intel syntax is used by many windows assemblers and debuggers. The two formats yield exactly the same machine language; however, there are a few differences in style and format:
Intel syntex:
mov dest, source ; copies data from the source to the destination
mov ebp,esp ; copies data from source "esp" register to the destination "ebp" register
AT&T syntex :
mov source, dest; copies data from the source to the destination
mov esp,ebp ; copies data from source "esp" register to the destination "ebp" register

AT&T format uses a % before registers while Intel does not. AT&T format uses a $ before literal values while Intel does not.
All the code in this post was compiled on an x86(Debian) Kali-Linux 1.1 using gcc 4.7 compiler.

In assembly the functions are mostly called by an instruction ‘CALL address’ the address is the place where the function code lies.Every function has an identical prologue(The starting of function code) and epilogue ( The ending of a function).

Prologue: The structure of Prologue is look like:
push  ebp
mov   esp,ebp

Epilogue: The structure of Prologue is look like:
leave 
ret

Now let's write a program which simply add two numbers:
#include<stdio.h>
#include<stdlib.h>

int sum(int a);
int main()

    int x, y;   
    x = 10;
    y = sum(x);   
    printf("The sum is: %d", y);
}

int sum(int a)
{
    int b=3,c; 
    c= b+a;
    return c;
}

Now Compile and load the program in the gdb :
root@Kali:~# gcc -g sum.c
root@Kali:~# gdb - ./a.out

We now disassemble the binary using gdb. Two registers are mentioned here: EBP points to the current frame (frame pointer), and ESP points to the top of the stack and updated every time. More detail here: basic-assembly-programming-survival-skills-for-hackers.
First, Let's disassemble the main function:
(gdb) disas main
Dump of assembler code for function main:
   0x0804841c <+0>:    push   %ebp
   0x0804841d <+1>:    mov    %esp,%ebp
   0x0804841f <+3>:    and    $0xfffffff0,%esp
   0x08048422 <+6>:    sub    $0x20,%esp
   0x08048425 <+9>:    movl   $0xa,0x1c(%esp)
   0x0804842d <+17>:    mov    0x1c(%esp),%eax
   0x08048431 <+21>:    mov    %eax,(%esp)
   0x08048434 <+24>:    call   0x8048453 <sum>
   0x08048439 <+29>:    mov    %eax,0x18(%esp)
   0x0804843d <+33>:    mov    0x18(%esp),%eax
   0x08048441 <+37>:    mov    %eax,0x4(%esp)
   0x08048445 <+41>:    movl   $0x8048500,(%esp)
   0x0804844c <+48>:    call   0x8048300 <printf@plt>
   0x08048451 <+53>:    leave 
   0x08048452 <+54>:    ret   
End of assembler dump.

Now, disassemble the sum function:
(gdb) disas sum
Dump of assembler code for function sum:
   0x08048453 <+0>:    push   %ebp
   0x08048454 <+1>:    mov    %esp,%ebp
   0x08048456 <+3>:    sub    $0x10,%esp
   0x08048459 <+6>:    movl   $0x3,-0x4(%ebp)
   0x08048460 <+13>:    mov    0x8(%ebp),%eax
   0x08048463 <+16>:    mov    -0x4(%ebp),%edx
   0x08048466 <+19>:    add    %edx,%eax
   0x08048468 <+21>:    mov    %eax,-0x8(%ebp)
   0x0804846b <+24>:    mov    -0x8(%ebp),%eax
   0x0804846e <+27>:    leave 
   0x0804846f <+28>:    ret   
End of assembler dump.

                                        


So, let’s look at what’s going on. The stack section is used to keep track of function calls (recursively) and grows from the higher addressed memory to the lower-addressed memory on most systems.Local variables exist in the stack section.

                                             


Now let’s look at main function assembler code.The first two lines are the function prologue :

0x0804841c <+0>:    push   %ebp
0x0804841d <+1>:    mov    %esp,%ebp

The Next two lines are also the part of function prologue :
 0x0804841c <+0>:    push   %ebp
 0x0804841d <+1>:    mov    %esp,%ebp
 0x0804841f <+3>:    and    $0xfffffff0,%esp
 0x08048422 <+6>:    sub    $0x20,%esp
But First two lines of function prologue are common in all types of system including windows and Linux. The first four lines are common only on Linux based system.

The first instruction :
push   %ebp
pushes old value of EBP, the base pointer, onto the stack and esp is updated. For the sake of simplicity, let's assume the starting address of stack is 0x1000

                                      


The second instruction :
mov    %esp,%ebp
copies current vale of esp into the ebp. This means ebp is now point to top of the stack.

                                            


The third instruction :
and    $0xfffffff0,%esp
aligns the stack to a 32-byte boundary and forces the last 4 bits of esp to 0:

                                                           


And Finally, the program allocates 32 bytes of stack (local) storage.

The fourth instruction :
 0x08048422 <+6>:    sub    $0x20,%esp
allocates 32(0x20 = 32 in decimal) bytes of space on the stack. Remember that the stack grows towards lower memory addresses so allocations will use ‘sub’ which actually means that  add 32 bytes of space on the stack.At this point, the program can be sure there are 32 bytes of 32-byte aligned memory being pointed to by esp.
                                                         

The fifth instruction:
0x08048425 <+9>:    movl   $0xa,0x1c(%esp)
moves the our variable 10 (10 is 0xa in hexa) into the location pointed by esp+28 bytes.The meaning of 0x1c(%esp) in simple words is 1c+esp i.e 28+esp(0x1c in hexa = 28 in decimal). So actual meaning of esp+28 bytes is to subtract 28 bytes from where esp is currently points.
                                               


The next instruction:
0x0804842d <+17>:    mov    0x1c(%esp),%eax
copy the data(i.e. 10 which is our variable value) from the location pointed by esp+28 bytes to the eax register.

                                   


The next instruction:
0x08048431 <+21>:    mov    %eax,(%esp)
moves the eax(which contain value 10) into the location pointed by esp currently.

                                         


The next instruction:
0x08048434 <+24>:     call   0x8048453 <sum>
simply calls the sum function.As soon as we call any function, the return address is saved onto the stack.

                                               


Every hackers, exploit writer have crush on this return address.As i have already discussed, Before the processor jumps on to the function code (sum function here), the address of next instruction to the call instruction is saved on the stack as return address, which will be loaded in EIP at when the called function(i.e. sum) finishes its job.In buffer overflow attacks this situation is exploited to control the execution of the processor by overwriting the saved return address. We will discus this attack technique in detail later tutorials.

Let's go back to the code, now we will look assembler code for function sum.
The first two instructions are same as in main function.
0x08048453 <+0>:    push   %ebp
0x08048454 <+1>:    mov    %esp,%ebp

                                       

The next instruction :
0x08048456 <+3>:    sub    $0x10,%esp
allocates 16 (0x10 in hexa =16 in decimal) bytes of space on the stack.

                                                


The next instruction:
0x08048459 <+6>:    movl   $0x3,-0x4(%ebp)
mov 3(our local variable value of sum function) into the location pointed by ebp-4 (-0x4(%ebp) = ebp-4) bytes. Don't get confused with esp and ebp here. we move -4 byte from ebp not esp.So , ebp-4 means 4 bytes towards down.

                                                   


The next instruction:
0x08048460 <+13>:    mov    0x8(%ebp),%eax
moves the data from the location pointed by ebp+8(ebp+8 contains eax register with value 10) into eax register.So ,ebp+8 means 8 bytes towards up.

The next instruction:
0x08048463 <+16>:    mov    -0x4(%ebp),%edx
moves the data from the location pointed by ebp-4(ebp-4 contains value 3) into edx register.



The next instruction:
 0x08048466 <+19>:    add    %edx,%eax
simply add the edx(i.e. 3) and eax(i.e. 10) and stores the results in eax register.

The next instruction:
0x08048468 <+21>:    mov    %eax,-0x8(%ebp)
moves the value from eax(i.e. 13) to the top of stack i.e. location in memory where ebp-8 points to.

                                               


The next instruction:
0x0804846b <+24>:    mov    -0x8(%ebp),%eax
moves the data from the location pointed by ebp-8(ebp-8 contain 13) into eax register.

                                         


The next instructions:
0x0804846e <+27>:    leave 
0x0804846f <+28>:    ret   
leave command moves the Base Pointer in to the Stack Pointer, then “pops” the old Base Pointer from the stack. The next command, ‘ret’ pops the return address from the stack and loads it into EIP. That causes execution to return to
where we left main function above.

                                   


Now sum function transfers the execution control back to the main. So let's go back to where we left in main.
In the main function , After the instruction:
0x08048434 <+24>:    call   0x8048453 <sum>

The next instruction is :
 0x08048439 <+29>:    mov    %eax,0x18(%esp)
moves the value of eax(i.e. 13) onto the location pointed by esp+24( 18 bytes upward side) .

                                        


The next instruction is :
0x0804843d <+33>:    mov    0x18(%esp),%eax
stores that value in the EAX register from  location pointed by esp+24.

The next instruction is :
0x08048441 <+37>:    mov    %eax,0x4(%esp)
moves the values(i.e. 13) from eax register to location pointed by esp+4.

                                     


The next instruction is :
0x08048445 <+41>:    movl   $0x8048500,(%esp)
moves the data at 0x8048500 to the esp. let's check what is there at 0x8048500:
(gdb) x/1s 0x8048500
0x8048500:     "The sum is: %d"
(gdb)
As expected, this is string "The sum is:"

                                             

The next instruction is :
0x0804844c <+48>:    call   0x8048300 <printf@plt>
simply call printf  function And the rest two lines are self explanatory and thus we get our result 13.In high level language C, it is very easy to add two numbers but in assembly, it is equally difficult. Again, A programmer is one who only concerned with source code. But A hacker realizes that the compiled program is what actually gets executed out in the real world.This is the first step towards  the understanding and exploiting buffer_overflow vulnerabilities.

I have tried to KISS(Keep It Simple Stupid). if you like this post or have any question, please feel free to comment !

1 comment:

Blogger Widget