Welcome to the second part of the Buffer Overflow series. In this post, we will explore the basics of Assembly.
Today we won’t delve too deeply into assembly language, as it can quickly become overwhelming. However, we’ll provide here enough insight to understand what is happening when we run our example program. If you want a more comprehensive understanding of assembly, we encourage you to explore further through various resources and blog posts already available online on the subject :)
So, to begin, what exactly is Assembly language?
Assembly language is a low-level programming language that exhibits a strong correspondence between its instructions and the machine code instructions of the computer’s architecture. It serves as the intermediary between software programs (as demonstrated in our example using C) and the underlying hardware of your computer (components such as RAM and the CPU for our example).
Why do I need to know assembly language?
Most of the time, you won’t have access to the binary source code you’re analyzing or exploiting. Assembly language provides a means to decipher what a program is executing, regardless of the high-level programming language used to generate the binary. Alongside tools like GDB, it stands as one of the most effective approaches for reverse engineering any given program.
With that in mind, let’s delve into the subject.
As a reminder, here our code example in C:
We can use GDB to disassemble the code, here I am using the Intel flavor:
[hg8@archbook ~]$ gdb vuln
Well, it doesn’t appear really fun to understand. Let’s break down what every lines does to get a better understanding of the program:
0x08049176 <+0>: push ebp
- This section is the prologue of our program. In assembly language, a prologue is a sequence of instructions that is executed at the beginning of a function to set up a new stack frame.
The function starts with a
push instruction, which pushes the value of the base pointer onto the stack. The base pointer,
ebp, is a register that points to the base of the current stack frame.
This is followed by a
mov ebp, esp instruction, which copies the current value of the stack pointer (
esp) into the base pointer (
ebp). The stack pointer is used to point to the currently used stack, since most programs will have multiple stack (for each function).
The next instruction,
push, saves the value of the
ebx register on the stack. The
ebx register is a general-purpose register that is often used to hold intermediate results and to pass function arguments.
sub subtracts a value from the stack pointer. This effectively reserves 0x1f4 (500 in decimal) bytes of space on the stack for local variables and function arguments. This corresponds to the space reserve for our
buffer variable (
char buffer ;)
Remember, the stack grows from higher memory address to lower memory address, hence the subtraction operation.
0x08049180 <+10>: call 0x80491ae <__x86.get_pc_thunk.ax>
- The instruction
call 0x80491aetransfers control to the function at the specified address. This function is likely a helper function that returns the address of the instruction following the call in the
0x08049185 <+15>: add eax,0x2053
This set of instructions prepares and handles the command line argument to be copied into our buffer variable.
addadds the value 0x2053 (8275 in decimal) to the
eaxregister. This adjusts the return value of the helper function to the address of the instruction following the
mov edx,[ebp+0xc], loads a value from memory into the
edxregister. The value is stored at an offset of 0xc (12 in decimal) bytes from the base pointer.
add edx,0x4adds a value to the
edxregister. This adjusts the value in the
edxregister to point to the next memory location.
mov edx, [edx]loads a value from the memory location pointed to by the
edxregister into the
edxregister. This is where our command line argument is being stored (
push edxpushes the value of the
edxregister onto the stack. Our
argvvariable is now stored in the stack (this is where the buffer overflow will occur).
lea edx, [ebp-0x1f8]calculates the address of the operand
[ebp-0x1f8]and stores the result in the
edxregister. This operand represents the memory location 0x1f8 (504 in decimal) bytes below the base pointer.
edxshould now hold the last address of the stack.
push edxpushes the value of the
edxregister (the calculated address) onto the stack.
mov ebx, eaxmoves the value in the
eaxregister into the
call 0x8049050transfers control to the function at the specified address. This function is the
strcpyfunction, which copies a null-terminated string from one location to another. This is where our command line argument (
argvgets copied into the
0x080491a1 <+43>: add esp,0x8
- The instruction
addadjusts the stack pointer by adding 8 to it. This pops the two values that were pushed onto the stack in the previous instructions.
0x080491a4 <+46>: mov eax,0x0
- The instruction
movcopies the value 0x0 (0 in decimal) into the
%eaxregister. This value is the return value of the
mainfunction, which is set to zero.
0x080491a9 <+51>: mov ebx,DWORD PTR [ebp-0x4]
This section is the epilogue of our program. In assembly language sequence an epilogue is a sequence of instructions executed at the end of a function to restore the state of the previous stack frame and return control to the calling function.
movloads a value from memory into the
ebxregister. The value is stored at an offset of -0x4 (-4 in decimal) bytes from the base pointer. This restores the original value of the
ebxregister that was saved on the stack earlier in the function.
leaverestores the stack frame by copying the value of the base pointer (saved on the stack) into the stack pointer and then popping the value of the base pointer off the stack. This effectively undoes the setup of the stack frame at the beginning of the function.
retreturns control to the caller of the function by popping the return address off the stack and transferring control to that address.
Alright, that was quite a journey!… Well done if you’ve managed to stay with us thus far! Don’t worry if you haven’t understood every detail yet. In our next post we will use the knowledge learned from the disassembly to exploit the buffer overflow. As you progress, you might find it easier to comprehend the process and can always revisit this post to review the functionality of a particular assembly line.
Now it’s time for the fun part: exploitation, see you there! Buffer Overflow: Command Execution By Shellcode Injection.
- What is the difference between ESP and EBP?
- What’s the purpose of the LEA instruction?
- What is the function of the push / pop instructions used on registers?
If you want to go deeper on learning Assembly language: