Showing posts with label gdb debugger. Show all posts
Showing posts with label gdb debugger. Show all posts

Monday, 20 July 2015

Concept of function in C - Hacker section!

In previous tutorial, Concept of function in C - Programmer section we have discussed the concept of function in C. For programmer that was enough but for hacker,this is not end! A programmer is one who only concerned with source code.A hacker is someone who is curious about how something(like computer) works.A hacker realizes that the compiled program is what actually gets executed out in the real world.This is the reason hackers have yet to learn.How function works in high level language C, we have discussed so far. Now we will see what is happening behind the scenes. I mean how function calls are represented in memory.

prerequisite : Concept of function in C - Programmer section!
                     Basic Assembly Programming Survival Skills for Hackers!
                     Essential GDB debugger skills for hacker to analyze the code !

There are two major syntax-es for x86 assembly: Intel and AT&T. AT&T syntax is used by Linux and The Intel syntax is used by many windows assemblers and debuggers. The two formats yield exactly the same machine language; however, there are a few differences in style and format:
Intel syntex:
mov dest, source ; copies data from the source to the destination
mov ebp,esp ; copies data from source "esp" register to the destination "ebp" register
AT&T syntex :
mov source, dest; copies data from the source to the destination
mov esp,ebp ; copies data from source "esp" register to the destination "ebp" register

AT&T format uses a % before registers while Intel does not. AT&T format uses a $ before literal values while Intel does not.
All the code in this post was compiled on an x86(Debian) Kali-Linux 1.1 using gcc 4.7 compiler.

In assembly the functions are mostly called by an instruction ‘CALL address’ the address is the place where the function code lies.Every function has an identical prologue(The starting of function code) and epilogue ( The ending of a function).

Prologue: The structure of Prologue is look like:
push  ebp
mov   esp,ebp

Epilogue: The structure of Prologue is look like:
leave 
ret

Now let's write a program which simply add two numbers:
#include<stdio.h>
#include<stdlib.h>

int sum(int a);
int main()

    int x, y;   
    x = 10;
    y = sum(x);   
    printf("The sum is: %d", y);
}

int sum(int a)
{
    int b=3,c; 
    c= b+a;
    return c;
}

Now Compile and load the program in the gdb :
root@Kali:~# gcc -g sum.c
root@Kali:~# gdb - ./a.out

We now disassemble the binary using gdb. Two registers are mentioned here: EBP points to the current frame (frame pointer), and ESP points to the top of the stack and updated every time. More detail here: basic-assembly-programming-survival-skills-for-hackers.
First, Let's disassemble the main function:
(gdb) disas main
Dump of assembler code for function main:
   0x0804841c <+0>:    push   %ebp
   0x0804841d <+1>:    mov    %esp,%ebp
   0x0804841f <+3>:    and    $0xfffffff0,%esp
   0x08048422 <+6>:    sub    $0x20,%esp
   0x08048425 <+9>:    movl   $0xa,0x1c(%esp)
   0x0804842d <+17>:    mov    0x1c(%esp),%eax
   0x08048431 <+21>:    mov    %eax,(%esp)
   0x08048434 <+24>:    call   0x8048453 <sum>
   0x08048439 <+29>:    mov    %eax,0x18(%esp)
   0x0804843d <+33>:    mov    0x18(%esp),%eax
   0x08048441 <+37>:    mov    %eax,0x4(%esp)
   0x08048445 <+41>:    movl   $0x8048500,(%esp)
   0x0804844c <+48>:    call   0x8048300 <printf@plt>
   0x08048451 <+53>:    leave 
   0x08048452 <+54>:    ret   
End of assembler dump.

Now, disassemble the sum function:
(gdb) disas sum
Dump of assembler code for function sum:
   0x08048453 <+0>:    push   %ebp
   0x08048454 <+1>:    mov    %esp,%ebp
   0x08048456 <+3>:    sub    $0x10,%esp
   0x08048459 <+6>:    movl   $0x3,-0x4(%ebp)
   0x08048460 <+13>:    mov    0x8(%ebp),%eax
   0x08048463 <+16>:    mov    -0x4(%ebp),%edx
   0x08048466 <+19>:    add    %edx,%eax
   0x08048468 <+21>:    mov    %eax,-0x8(%ebp)
   0x0804846b <+24>:    mov    -0x8(%ebp),%eax
   0x0804846e <+27>:    leave 
   0x0804846f <+28>:    ret   
End of assembler dump.

                                        


So, let’s look at what’s going on. The stack section is used to keep track of function calls (recursively) and grows from the higher addressed memory to the lower-addressed memory on most systems.Local variables exist in the stack section.

                                             


Now let’s look at main function assembler code.The first two lines are the function prologue :

0x0804841c <+0>:    push   %ebp
0x0804841d <+1>:    mov    %esp,%ebp

The Next two lines are also the part of function prologue :
 0x0804841c <+0>:    push   %ebp
 0x0804841d <+1>:    mov    %esp,%ebp
 0x0804841f <+3>:    and    $0xfffffff0,%esp
 0x08048422 <+6>:    sub    $0x20,%esp
But First two lines of function prologue are common in all types of system including windows and Linux. The first four lines are common only on Linux based system.

The first instruction :
push   %ebp
pushes old value of EBP, the base pointer, onto the stack and esp is updated. For the sake of simplicity, let's assume the starting address of stack is 0x1000

                                      


The second instruction :
mov    %esp,%ebp
copies current vale of esp into the ebp. This means ebp is now point to top of the stack.

                                            


The third instruction :
and    $0xfffffff0,%esp
aligns the stack to a 32-byte boundary and forces the last 4 bits of esp to 0:

                                                           


And Finally, the program allocates 32 bytes of stack (local) storage.

The fourth instruction :
 0x08048422 <+6>:    sub    $0x20,%esp
allocates 32(0x20 = 32 in decimal) bytes of space on the stack. Remember that the stack grows towards lower memory addresses so allocations will use ‘sub’ which actually means that  add 32 bytes of space on the stack.At this point, the program can be sure there are 32 bytes of 32-byte aligned memory being pointed to by esp.
                                                         

The fifth instruction:
0x08048425 <+9>:    movl   $0xa,0x1c(%esp)
moves the our variable 10 (10 is 0xa in hexa) into the location pointed by esp+28 bytes.The meaning of 0x1c(%esp) in simple words is 1c+esp i.e 28+esp(0x1c in hexa = 28 in decimal). So actual meaning of esp+28 bytes is to subtract 28 bytes from where esp is currently points.
                                               


The next instruction:
0x0804842d <+17>:    mov    0x1c(%esp),%eax
copy the data(i.e. 10 which is our variable value) from the location pointed by esp+28 bytes to the eax register.

                                   


The next instruction:
0x08048431 <+21>:    mov    %eax,(%esp)
moves the eax(which contain value 10) into the location pointed by esp currently.

                                         


The next instruction:
0x08048434 <+24>:     call   0x8048453 <sum>
simply calls the sum function.As soon as we call any function, the return address is saved onto the stack.

                                               


Every hackers, exploit writer have crush on this return address.As i have already discussed, Before the processor jumps on to the function code (sum function here), the address of next instruction to the call instruction is saved on the stack as return address, which will be loaded in EIP at when the called function(i.e. sum) finishes its job.In buffer overflow attacks this situation is exploited to control the execution of the processor by overwriting the saved return address. We will discus this attack technique in detail later tutorials.

Let's go back to the code, now we will look assembler code for function sum.
The first two instructions are same as in main function.
0x08048453 <+0>:    push   %ebp
0x08048454 <+1>:    mov    %esp,%ebp

                                       

The next instruction :
0x08048456 <+3>:    sub    $0x10,%esp
allocates 16 (0x10 in hexa =16 in decimal) bytes of space on the stack.

                                                


The next instruction:
0x08048459 <+6>:    movl   $0x3,-0x4(%ebp)
mov 3(our local variable value of sum function) into the location pointed by ebp-4 (-0x4(%ebp) = ebp-4) bytes. Don't get confused with esp and ebp here. we move -4 byte from ebp not esp.So , ebp-4 means 4 bytes towards down.

                                                   


The next instruction:
0x08048460 <+13>:    mov    0x8(%ebp),%eax
moves the data from the location pointed by ebp+8(ebp+8 contains eax register with value 10) into eax register.So ,ebp+8 means 8 bytes towards up.

The next instruction:
0x08048463 <+16>:    mov    -0x4(%ebp),%edx
moves the data from the location pointed by ebp-4(ebp-4 contains value 3) into edx register.



The next instruction:
 0x08048466 <+19>:    add    %edx,%eax
simply add the edx(i.e. 3) and eax(i.e. 10) and stores the results in eax register.

The next instruction:
0x08048468 <+21>:    mov    %eax,-0x8(%ebp)
moves the value from eax(i.e. 13) to the top of stack i.e. location in memory where ebp-8 points to.

                                               


The next instruction:
0x0804846b <+24>:    mov    -0x8(%ebp),%eax
moves the data from the location pointed by ebp-8(ebp-8 contain 13) into eax register.

                                         


The next instructions:
0x0804846e <+27>:    leave 
0x0804846f <+28>:    ret   
leave command moves the Base Pointer in to the Stack Pointer, then “pops” the old Base Pointer from the stack. The next command, ‘ret’ pops the return address from the stack and loads it into EIP. That causes execution to return to
where we left main function above.

                                   


Now sum function transfers the execution control back to the main. So let's go back to where we left in main.
In the main function , After the instruction:
0x08048434 <+24>:    call   0x8048453 <sum>

The next instruction is :
 0x08048439 <+29>:    mov    %eax,0x18(%esp)
moves the value of eax(i.e. 13) onto the location pointed by esp+24( 18 bytes upward side) .

                                        


The next instruction is :
0x0804843d <+33>:    mov    0x18(%esp),%eax
stores that value in the EAX register from  location pointed by esp+24.

The next instruction is :
0x08048441 <+37>:    mov    %eax,0x4(%esp)
moves the values(i.e. 13) from eax register to location pointed by esp+4.

                                     


The next instruction is :
0x08048445 <+41>:    movl   $0x8048500,(%esp)
moves the data at 0x8048500 to the esp. let's check what is there at 0x8048500:
(gdb) x/1s 0x8048500
0x8048500:     "The sum is: %d"
(gdb)
As expected, this is string "The sum is:"

                                             

The next instruction is :
0x0804844c <+48>:    call   0x8048300 <printf@plt>
simply call printf  function And the rest two lines are self explanatory and thus we get our result 13.In high level language C, it is very easy to add two numbers but in assembly, it is equally difficult. Again, A programmer is one who only concerned with source code. But A hacker realizes that the compiled program is what actually gets executed out in the real world.This is the first step towards  the understanding and exploiting buffer_overflow vulnerabilities.

I have tried to KISS(Keep It Simple Stupid). if you like this post or have any question, please feel free to comment !

Essential GDB debugger skills for hacker to analyze the code !

An application or computer program without any debug information may be vulnerable to security hole like buffer overflow, memory leak, format string etc.Today’s operating systems and applications are increasing in lines of code (LOC). Windows operating systems have approximately 40 million LOC. Unix and Linux operating systems have much less, usually around 2 million LOC. A common estimate used in the industry is that there are between 5–50 bugs per 1,000 lines of code. So a middle of the road estimate would be that Windows 7 has approximately 1,200,000 bugs.If the software did not contain 5–50 exploitable bugs within every 1,000 lines of code, we would not have to build the fortresses we are constructing today.Many companies(for example Microsoft) ship products that hide what appear to be an almost infinite number of break-in vulnerabilities. They try to hide these problems by keeping their source code secret.this does make your job harder but not impossible. Analyzing machine code isn't so complex as to stop hackers for long.All you need is think from a hacker’s perspective. But remember, stealing code or breaking code without  permission of owner is illegal.An ethical hacker is one who break code under controlled circumstances. If you found vulnerabilities in software, just report to the vendors.I have no intention to provide illegal techniques.this article is strictly for educational purposes only.

The best strategy is to write code that has as few bugs as possible. This can be achieved by using pseudo-code and verifying the logic of the pseudo-code even before you attempt to translate it into an assembly language program. To isolate a bug, program execution should be observed in slow motion. Most debuggers provide a command to execute a program in single-step mode. Debuggers provide commands to set up breakpoints. The program execution stops at breakpoints, giving us a chance to look at the state of the program. Another helpful feature that most debuggers provide is the watch facility.
GDB, the GNU Project debugger, allows you to see what is going on `inside' another program while it executes -- or what another program was doing at the moment it crashed.

GDB can do four main kinds of things (plus other things in support of these) to help you catch bugs in the act:
Start your program, specifying anything that might affect its behavior.
Make your program stop on specified conditions.
Examine what has happened, when your program has stopped.
Change things in your program, so you can experiment with correcting the effects of one bug and go on to learn about another.


Now let's discuss useful operation of GDB in debugging the program.
1. To run the program in GDB:
gdb file_name
For example, to debug the HelloWorld program :
gdb HelloWorld

2.If arguments as well have to be passed to the program to be loaded into GDB, following options can be use :
gdb YourProgramName args arg1 arg2 arg3 … argN

Displaying Source Code :
list  ; To list the source code of executable loaded displays the Source Code and default number of lines.

Displaying Register Contents :
info registers                      To see the content and state of all registers
info all-registers            Displays the contents of registers.
info register ...               Displays contents of the specified registers.
For example,
info eax ecx edx ; to check the contents of the eax, ecx, and edx registers.

Memory display commands :
x                 address Displays the contents of memory at address (uses defaults).
x/nfu         address Displays the contents of memory at address

Or,
(gdb) x/FMT &Label_name          To see the value of variable (useful in case of integers).
(gdb) x/1s &Label_name              To see the whole string in single-shot (useful in case of strings).
(gdb) x/1s register         To see the whole string in single-shot located at the address                                                                           stored in register.
(gdb) x/1s 0x080000               To see the whole string in single-shot at a particular address.

Break point Commands :
Use the “break” or “b” command at gdb prompt to specify a location which could be a function name, a line number or a source file and line number.
For example, the following commands insert breakpoint at line 20(assume) and function sum on line 32(assume) in a program: 
(gdb) b 20
Breakpoint 1 at 0x80560a0: file HelloWorld.asm, line 20.
(gdb) b sum
Breakpoint 2 at 0x80560j5: file HelloWorld.asm, line 32.
(gdb)

Note:
* We can use info breakpoints (or simply info b) to get a summary of breakpoints and their status.
* We can use the enable and disable commands to enable or disable the breakpoints.

More Breakpoint commands :
break main          to set a break point at the function “main”
break 5             to set a break point at the code line number 5
break function      Sets a breakpoint at entry to the specified function in the current source file
break *_start+1     include “nop” on the very next line of it to get a break
                    point there
delete              Deletes all breakpoints.

Program execution commands :
run             Executes the program under GDB
continue      Continues execution from where the program has last stopped (e.g., due to a breakpoint).
step                  Single-steps execution of the program (i.e., one source line at a time).

More Examine command :
print variable_name            To see the value of a variable in decimal
print /x variable_name    To see the value of a variable in hex
print /c variable_name    To see the value of a variable in ASCII
print &Label_name                To see the address of Label_name
print /x &Label_name           To see the address of Lable_name in better format
print /c eax                           To see the value in register in ASCII
print /d eax                            To see the value in register in Decimal
print /x eax                            To see the value in register in HEX

Now let's use these features of GDB in real world. First we will write a simple computer C program:
#include <stdio.h>
int main()
{
    printf("Hello World");
}
Now load this program in GDB:

root@r00t:~/Desktop/c_programming/blog_tutorial# gcc simple.c
root@r00t:~/Desktop/c_programming/blog_tutorial# gdb -q ./a.out
GNU gdb (GDB) 7.4.1-debian
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying" and "show warranty" for details.
This GDB was configured as "i486-linux-gnu".
For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /root/Desktop/c_programming/blog_tutorial/a.out...done.
(gdb) list
1    #include<stdio.h>
2   
3    int main()
4    {
5        printf("Hello World");
6   
7    }
(gdb) disassemble main
Dump of assembler code for function main: 
0x0804841c <+0>:    push   %ebp 
0x0804841d <+1>:    mov    %esp,%ebp 
0x0804841f <+3>:    and    $0xfffffff0,%esp  
0x08048422 <+6>:    sub    $0x10,%esp
0x08048425 <+9>:    movl   $0x80484d0,(%esp) 
0x0804842c <+16>:    call   0x8048300 <printf@plt>
0x08048431 <+21>:    leave  
0x08048432 <+22>:    ret   
End of assembler dump.
(gdb)break main
Breakpoint 1 at 0x8048425: file simple.c, line 5.
(gdb) run
Starting program: /root/Desktop/c_programming/blog_tutorial/a.out
Breakpoint 1, main () at simple.c:5
5    printf("Hello World");
(gdb) info register eip
eip            0x8048425    0x8048425 <main+9> 

               
     
                            
First we use list command to list the source code of executable. Then disassembly of the main() function is displayed. Then we use break main command to set a breakpoint at the start of main(), and the program is run. As We have already discussed that this break command simply tells the debugger to pause the execution of the program when it gets to that point. Since the breakpoint has been set at the start of the main() function, the program hits the breakpoint and pauses before actually executing any instructions(here printf('HelloWorld') is the next instruction to be executed after main() ) in main().
Then we use info eip command where eip is the register. This command simply displays contents of the specified register(here is eip). So the value of EIP (the Instruction Pointer) is displayed.In all of assembly registers, we have to concentrate on EIP(Enhanced Instruction Pointer). This register contains the pointer to the instruction ready for the processing. Thus if by any means we can control this pointer in EIP register, we will have the control over the CPU of victim machine.By modifying the EIP, if we fill it with the address of buffer, which is controlled by us and is filled with machine code, then the processor will ultimately be derailed from its normal execution and will execute the code supplied by us. This is the way buffer overflow attack works.

The GDB debugger provides a direct method to examine memory, using the command x, which is short for examine. Examining memory is a critical skill for any hacker. With a debugger like GDB, every aspect of a program's execution can be deterministically examined, paused, stepped through, and repeated as often as needed. Since a running program is mostly just a processor and segments of memory, examining memory is the first way to look at what's really going on.

As we had already discussed Memory display commands which help us into look at a certain address of memory in a variety of ways. Now we will use them in real world. In GDB, memory can be display in many format octal, binary, hexadecimal,standard base-10 format etc.
Some common format letters are as follows:
o ; to display in octal.
x ; to display in hexadecimal.
u ; to display in unsigned, standard base-10 decimal.
t ; Display in binary.

(gdb) x/o 0x8048425
0x8048425 <main+9>:    032011002307
(gdb) x/x $eip
0x8048425 <main+9>:    0xd02404c7
(gdb) x/u $eip
0x8048425 <main+9>:    3492021447
(gdb) x/t $eip
0x8048425 <main+9>:    11010000001001000000010011000111

We use these instruction in examine current address of the EIP register in various format.The value 032011002307 in octal is the same as 0xd02404c7  in hexadecimal, which is the same as 3492021447 in base-10 decimal, which in turn is the same as 11010000001001000000010011000111  in binary.

A number can also be pretended to the format of the examine command to examine multiple units at the target address:
(gdb) x/2x $eip
0x8048425 <main+9>:    0xd02404c7    0xe8080484
(gdb) x/5x $eip
0x8048425 <main+9>:    0xd02404c7    0xe8080484    0xfffffecf    0x9090c3c9
0x8048435:    0x90909090

The default size of a single unit is a four-byte unit called a word.The size of the display units for the examine
command can be changed by adding a size letter to the end of the format letter.
Some size letters are as follows:
b; A single byte
h; A halfword, which is two bytes in size
w; A word, which is four bytes in size
g; A giant, which is eight bytes in size

Now let use them in our program:
(gdb) x/8xb $eip
0x8048425 <main+9>:    0xc7    0x04    0x24    0xd0    0x84    0x04    0x08    0xe8
(gdb) x/8xh $eip
0x8048425 <main+9>:    0x04c7    0xd024    0x0484    0xe808    0xfecf    0xffff    0xc3c9    0x9090
(gdb) x/8xw $eip
0x8048425 <main+9>:    0xd02404c7    0xe8080484    0xfffffecf    0x9090c3c9
0x8048435:            0x90909090    0x90909090    0x55909090    0xc35de589

First examine shows the first two bytes to be 0xc7 and 0x04, but when a halfword is examined at the exact same memory address, the value 0x04c7 is shown, with the bytes reversed. This same byte-reversal effect can be seen when a full four-byte word is shown as 0xd02404c7, but when the first four bytes are shown byte by byte, they are in the order of 0xc7, 0x04, 0x24, and 0xd0. Why ? Hint: Use your little knowledge of endian architecture.

nexti command is used to execute the current instruction also known as next instruction.The processor will read the instruction at EIP, execute it, and advance EIP to the next instruction. let's see :
(gdb) nexti
0x0804842c    5        printf("Hello World");
(gdb) x/i $eip
=> 0x804842c <main+16>:    call   0x8048300 <printf@plt>

The c format letter can be used to automatically look up a byte on the ASCII table, and the s format letter will display an entire string of character data.
(gdb) x/xw $esp
0xbffff490:    0x080484d0
(gdb) x/6cb 0x080484d0
0x80484d0:    72 'H'    101 'e'    108 'l'    108 'l'    111 'o'    32 ' '
(gdb)  x/s 0x080484d0
0x80484d0:     "Hello World"
These commands reveal that the data string "Hello, world!\n" is stored at memory address 0x080484d0.

Looking at the full disassembly again, you should be able to tell which parts of the C code have been compiled into which machine instructions.
(gdb) disassemble main
Dump of assembler code for function main:
0x0804841c <+0>:    push   %ebp
0x0804841d <+1>:    mov    %esp,%ebp
0x0804841f <+3>:    and    $0xfffffff0,%esp
0x08048422 <+6>:    sub    $0x10,%esp
0x08048425 <+9>:    movl   $0x80484d0,(%esp)
0x0804842c <+16>:    call   0x8048300 <printf@plt>
0x08048431 <+21>:    leave 
0x08048432 <+22>:    ret
End of assembler dump.
(gdb) list
1    #include<stdio.h>
2   
3    int main()
4    {
5        printf("Hello World");
6   
7.    }
(gdb)

I have discussed gdb commands which helpful to hackers in examine binary program. if you like this post or have any question, please feel free to comment !

Reference Material :
1.Debugging with gdb
2. GDB Documentation
3. Hacking the art of exploitation
Blogger Widget