Tuesday 21 July 2015

Understanding the assembly instruction!

There are two main forms of assembly instructions syntax: AT&T and Intel. The AT&T assembly language format used in BSD-based UNIX operating systems, and consequently in the assembler listings generated by gdb, is quite different than the Intel format that you likely have learned.  While Intel assembly language format used in Windows operating systems.

Among the most noticeable differences between the AT&T and Intel formats is the way they refer to source and destination operands within an instruction. Under the Intel format an instruction's source and destination operands appear on the right and left of the comma which separates them, respectively.
Under the AT&T format, these roles are reversed: source operands appear on the left and destination operands appear on the right.  For an example, look at the following set of instructions and how they are represented differently under the two formats:

Intel(NASM) syntax                  AT&T syntax
push eax                                   pushl %eax
   
Another difference between AT&T and Intel is in Register naming. All CPU register names in the AT&T format are prefixed with the percent ("%") character.

The AT&T and Intel syntax also differ in  Instruction naming. The AT&T format uses slightly different names for 80386 instructions than the Intel format.They differ in keeping with VAX and Motorola traditions where instruction names include a suffix which describes the size of the data they modify.  Under the Intel format, these data size directives are normally described using the 'BYTE PTR', 'WORD PTR', and 'DWORD PTR' prefix phrases

Intel                                          AT&T 
ADD EAX, 30             addl $0x30, %eax  

Note: Instruction suffixes in AT&T  are "b" for byte size operations (8 bits), "w" for word size operations(16 bits), and "l" for double-word operations (32 bits).
For more difference between AT&T and Intel syntax, see the reference.

Note: All the code in this post and blog was compiled on an x86_32 CPU running kali-linux. That means we use AT&T syntax. But the AT&T syntax can be shown in Intel syntax by providing an additional command-line option, -M intel, to objdump(For more, see Reference). But why we use Intel syntax? Because peoples who had a strong opinion almost always preferred Intel-style. Intel syntax is much more readable and easier to understand.Another main problem with AT&T is Googleability. The Intel syntax instructions are more visible to Google as compared to AT&T syntax.

Assembly language instructions:Machine instructions generally fall into three categories:- data movement, arithmetic/logic, and control-flow but we can sub-divied them into follwing categories:
Transfer instructions(like MOV)
Loading instructions(like LODS,LEA )
Stack instructions(like push,pop)
Logic instructions(like AND,NOT)
Arithmetic instructions(like ADD, SUB)
Jump instructions (like JMP, JAE)
Instructions for cycles: loop(like LOOP)
Counting Instructions(like DEC,INC)
Comparison Instructions(like CMP)
Flag Instructions(CLC, CLD)

Data Movement Instructions:They are mostly used for moving data from one place to another.
Note: The following instructions are for AT&T syntax.They are vice-versa for Intel syntax.

1. movl(Move) :-The movl instruction copies the data item referred to by its first operand (i.e. register contents, memory contents, or a constant value) into  the location referred to by its second operand (i.e. a register or memory).
movl %eax, %ebx ; Moving a double-word value (4 bytes) from the register eax into register ebx. The value in eax remains the same
The register-to-register moves are possible but cannot move from memory to memory.In cases where memory transfers are desired, the source memory contents must first be loaded into a register, then can be stored  to the destination memory address.

The movl instruction is useful for transferring data along any of the following paths: 
* To a register from memory
* To memory from a register 
* Between general registers
* Immediate data to a register
* Immediate data to a memory

2.pushl(Push stack) :-The pushl instruction places its operand onto the top of the hardware supported stack in memory. Specifically, push first decrements ESP by 4, then places its operand into the contents of the 32-bit location at address [ESP]. This is the equivalent to performing subl $4, %esp . ESP (the stack pointer) is decremented by push since the x86 stack grows down - i.e. the stack grows from high addresses to lower addresses.
Syntax:
push %register      ;push register on the stack
push memory_address   ;push the 4 bytes at address memory_address onto the stack

3.popl(Pop stack) :-The popl instruction removes the 4-byte data element from the top of the hardware-supported stack into the specified operand (i.e. register or memory location).This is equivalent to performing movl (%esp). It first moves the 4 bytes located at memory location [SP] into the specified register or memory location, and then increments SP by 4.
popl %edi — pop the top element of the stack into EDI.
popl [%ebx] — pop the top element of the stack into memory at the four bytes starting at location EBX.

4.lea (Load effective address):-The lea instruction places the address specified by its first operand into the register specified by its second operand. Note, the contents of the memory location are not loaded, only the effective address is computed and placed into the register. This is useful for obtaining a pointer into a memory region.
Syntax:
leal memory_location,%register
For example, leal 5(%ebp,%ecx,1), %eax loads the address computed by 5 + %ebp + 1 * %ecx and stores that in %eax.

Arithmetic and Logic Instructions:
1. addl (Integer Addition ) and subl (Integer Subtraction):-The addl instruction Addition. Adds the first operand to the second, storing the result in its second operand. Note, whereas both operands may be registers, at most one operand may be a memory location.
syntax:
addl I/R/M, R/M
where I is immediate-mode value (I), R is a register (R) and M is a memory address (M).
If the result is larger than the destination register, the overflow and carry bits are set to true.

While The subl instruction subtracts the first operand from the second, and stores the result in the second operand.
syntax:
subl I/R/M, R/M
This instruction can be used on both signed and unsigned numbers.

2.divl(division) and mull (multiplication): divl Performs unsigned division. Divides the contents of the double-word contained in the combined %edx:%eax registers by the value in the register or memory location specified.
syntax:
divl R/M
The %eax register contains the resulting quotient, and the %edx register contains the resulting remainder. If the quotient is too large to fit in %eax, it triggers a type 0 interrupt.

While mull Perform unsigned multiplication. The syntax is:
mull R/M/I, R

3.incl and decl (Increment and Decrement) :-incl Increments the given register or memory location while decl Decrements the register or memory location.
syntax:
incl R/M
decl R/M

4.imull and idivl(Integer Multiplication and Integer Division): imull Performs signed multiplication and stores the result in the second operand. If the second operand is left out, it is assumed to be %eax, and the full result is stored in the double-word %edx:%eax.
syntax:
imull  R/M/I, R 

while idivl Performs signed division. The syntax is:
idivl R/M

Control Flow Instructions: These instructions may alter the flow of the program.The x86 processor maintains an instruction pointer (IP) register that is a 32-bit value indicating the location in memory where the current instruction starts. Normally, it increments to point to the next instruction in memory begins after execution an instruction. The IP register cannot be manipulated directly, but is updated implicitly by provided control flow instructions.

1.jmpl (Jump):Transfers program control flow to the instruction at the memory location indicated by the operand.This is Unconditional jump.
Syntax:
jmpl destination_address
This instruction is used to deviate the flow of a program without taking into account the actual conditions of the flags or of the data.

There are various conditional jumps like:
jal ; After a comparison this command jumps if it is or jumps if it is not down or if not it is the equal.
jael ; It jumps if it is or it is the equal or if it is not down.
For more, see reference.

2.cmpl (Compare) : cmpl used to Compares two integers. It does this by subtracting the first operand from the second. It discards the results, but sets the flags accordingly. Usually used before a conditional jump.
Syntax:
cmpl I/R/M, R/M

3.call(Subroutine call):The call instruction first pushes the current code location onto the hardware supported stack in memory , and then performs an unconditional jump to the code location indicated by the label operand. Unlike the simple jump instructions, the call instruction saves the location to return to when the subroutine completes.
Syntax:
call destination_addres
Alternatively, the destination address can be an asterisk followed by a register for an indirect function call. For example, call *%eax will call the function at the address in %eax.

4.ret(return): The ret instruction implements a subroutine return mechanism. This instruction first pops a code location off the hardware supported in-memory stack. It then performs an unconditional jump to the retrieved code location.
Syntax
call <label>
ret

Let take a real world example. In this example, we have disassemble the main function:
                                                

                              

I think Now you must be able to understand these assembly code.

Reference: 1. Difference between AT&t and intel
2. AT&T syntax can be shown in Intel syntax: http://programmingethicalhackerway.blogspot.in/2015/07/basic-assembly-programming-survival.html
3. Programming from ground up

If you like this post or have any question, please feel free to comment!

No comments:

Post a Comment

Blogger Widget