Buffer Overflow: Reversing Assembly
Welcome to the second part of the Buffer Overflow series. In this post, we will explore the basics of Assembly.
Today we won’t delve too deeply into assembly language, as it can quickly become overwhelming. However, we’ll provide here enough insight to understand what is happening when we run our example program. If you want a more comprehensive understanding of assembly, we encourage you to explore further through various resources and blog posts already available online on the subject :)
So, to begin, what exactly is Assembly language?
Assembly language is a low-level programming language that exhibits a strong correspondence between its instructions and the machine code instructions of the computer’s architecture. It serves as the intermediary between software programs (as demonstrated in our example using C) and the underlying hardware of your computer (components such as RAM and the CPU for our example).
Why do I need to know assembly language?
Most of the time, you won’t have access to the binary source code you’re analyzing or exploiting. Assembly language provides a means to decipher what a program is executing, regardless of the high-level programming language used to generate the binary. Alongside tools like GDB, it stands as one of the most effective approaches for reverse engineering any given program.
With that in mind, let’s delve into the subject.
Example code
As a reminder, here our code example in C:
1 |
|
Assembly
We can use GDB to disassemble the code, here I am using the Intel flavor:
1 | [hg8@archbook ~]$ gdb vuln |
Well, it doesn’t appear really fun to understand. Let’s break down what every lines does to get a better understanding of the program:
Prologue
1 | 0x08049176 <+0>: push ebp |
- This section is the prologue of our program. In assembly language, a prologue is a sequence of instructions that is executed at the beginning of a function to set up a new stack frame.
The function starts with a push
instruction, which pushes the value of the base pointer onto the stack. The base pointer, ebp
, is a register that points to the base of the current stack frame.
This is followed by a mov ebp, esp
instruction, which copies the current value of the stack pointer (esp
) into the base pointer (ebp
). The stack pointer is used to point to the currently used stack, since most programs will have multiple stack (for each function).
The next instruction, push
, saves the value of the ebx
register on the stack. The ebx
register is a general-purpose register that is often used to hold intermediate results and to pass function arguments.
The instruction sub
subtracts a value from the stack pointer. This effectively reserves 0x1f4 (500 in decimal) bytes of space on the stack for local variables and function arguments. This corresponds to the space reserve for our buffer
variable (char buffer [500];
)
Remember, the stack grows from higher memory address to lower memory address, hence the subtraction operation.
Command line argument handling
1 | 0x08049180 <+10>: call 0x80491ae <__x86.get_pc_thunk.ax> |
- The instruction
call 0x80491ae
transfers control to the function at the specified address. This function is likely a helper function that returns the address of the instruction following the call in theeax
register.
1 | 0x08049185 <+15>: add eax,0x2053 |
This set of instructions prepares and handles the command line argument to be copied into our buffer variable.
The instruction
add
adds the value 0x2053 (8275 in decimal) to theeax
register. This adjusts the return value of the helper function to the address of the instruction following theadd
instruction.mov edx,[ebp+0xc]
, loads a value from memory into theedx
register. The value is stored at an offset of 0xc (12 in decimal) bytes from the base pointer.add edx,0x4
adds a value to theedx
register. This adjusts the value in theedx
register to point to the next memory location.mov edx, [edx]
loads a value from the memory location pointed to by theedx
register into theedx
register. This is where our command line argument is being stored (argv[1]
).push edx
pushes the value of theedx
register onto the stack. Ourargv[1]
variable is now stored in the stack (this is where the buffer overflow will occur).lea edx, [ebp-0x1f8]
calculates the address of the operand[ebp-0x1f8]
and stores the result in theedx
register. This operand represents the memory location 0x1f8 (504 in decimal) bytes below the base pointer.edx
should now hold the last address of the stack.push edx
pushes the value of theedx
register (the calculated address) onto the stack.mov ebx, eax
moves the value in theeax
register into theebx
register.call 0x8049050
transfers control to the function at the specified address. This function is thestrcpy
function, which copies a null-terminated string from one location to another. This is where our command line argument (argv[1]
gets copied into thebuffer[500]
variable.
Main function exit
1 | 0x080491a1 <+43>: add esp,0x8 |
- The instruction
add
adjusts the stack pointer by adding 8 to it. This pops the two values that were pushed onto the stack in the previous instructions.
1 | 0x080491a4 <+46>: mov eax,0x0 |
- The instruction
mov
copies the value 0x0 (0 in decimal) into the%eax
register. This value is the return value of themain
function, which is set to zero.
Epilogue
1 | 0x080491a9 <+51>: mov ebx,DWORD PTR [ebp-0x4] |
This section is the epilogue of our program. In assembly language sequence an epilogue is a sequence of instructions executed at the end of a function to restore the state of the previous stack frame and return control to the calling function.
The instruction
mov
loads a value from memory into theebx
register. The value is stored at an offset of -0x4 (-4 in decimal) bytes from the base pointer. This restores the original value of theebx
register that was saved on the stack earlier in the function.leave
restores the stack frame by copying the value of the base pointer (saved on the stack) into the stack pointer and then popping the value of the base pointer off the stack. This effectively undoes the setup of the stack frame at the beginning of the function.
The instructionret
returns control to the caller of the function by popping the return address off the stack and transferring control to that address.
Alright, that was quite a journey!… Well done if you’ve managed to stay with us thus far! Don’t worry if you haven’t understood every detail yet. In our next post we will use the knowledge learned from the disassembly to exploit the buffer overflow. As you progress, you might find it easier to comprehend the process and can always revisit this post to review the functionality of a particular assembly line.
Now it’s time for the fun part: exploitation, see you there! Buffer Overflow: Command Execution By Shellcode Injection.
References
- What is the difference between ESP and EBP?
- What’s the purpose of the LEA instruction?
- What is the function of the push / pop instructions used on registers?
If you want to go deeper on learning Assembly language: