Buffer Overflow: Reversing Assembly

12-09-2023

— Written by hg8 — 6 min read

buffer overflow banner

Welcome to the second part of the Buffer Overflow series. In this post, we will explore the basics of Assembly.

Today we won’t delve too deeply into assembly language, as it can quickly become overwhelming. However, we’ll provide here enough insight to understand what is happening when we run our example program. If you want a more comprehensive understanding of assembly, we encourage you to explore further through various resources and blog posts already available online on the subject :)

So, to begin, what exactly is Assembly language?

Assembly language is a low-level programming language that exhibits a strong correspondence between its instructions and the machine code instructions of the computer’s architecture. It serves as the intermediary between software programs (as demonstrated in our example using C) and the underlying hardware of your computer (components such as RAM and the CPU for our example).

Why do I need to know assembly language?

Most of the time, you won’t have access to the binary source code you’re analyzing or exploiting. Assembly language provides a means to decipher what a program is executing, regardless of the high-level programming language used to generate the binary. Alongside tools like GDB, it stands as one of the most effective approaches for reverse engineering any given program.

With that in mind, let’s delve into the subject.

Example code

As a reminder, here our code example in C:

#include <stdio.h>
#include <string.h>

int main (int argc, char** argv) {
  char buffer [500];
  strcpy(buffer, argv[1]);
  return 0;
}

Assembly

We can use GDB to disassemble the code, here I am using the Intel flavor:

[hg8@archbook ~]$ gdb vuln
Reading symbols from vuln...
(gdb) disassemble main
Dump of assembler code for function main:
   0x08049176 <+0>:     push   ebp
   0x08049177 <+1>:     mov    ebp,esp
   0x08049179 <+3>:     push   ebx
   0x0804917a <+4>:     sub    esp,0x1f4
   0x08049180 <+10>:    call   0x80491ae <__x86.get_pc_thunk.ax>
   0x08049185 <+15>:    add    eax,0x2053
   0x0804918a <+20>:    mov    edx,DWORD PTR [ebp+0xc]
   0x0804918d <+23>:    add    edx,0x4
   0x08049190 <+26>:    mov    edx,DWORD PTR [edx]
   0x08049192 <+28>:    push   edx
   0x08049193 <+29>:    lea    edx,[ebp-0x1f8]
   0x08049199 <+35>:    push   edx
   0x0804919a <+36>:    mov    ebx,eax
   0x0804919c <+38>:    call   0x8049050 <strcpy@plt>
   0x080491a1 <+43>:    add    esp,0x8
   0x080491a4 <+46>:    mov    eax,0x0
   0x080491a9 <+51>:    mov    ebx,DWORD PTR [ebp-0x4]
   0x080491ac <+54>:    leave
   0x080491ad <+55>:    ret
End of assembler dump.

Well, it doesn’t appear really fun to understand. Let’s break down what every lines does to get a better understanding of the program:

Prologue

0x08049176 <+0>:     push   ebp
0x08049177 <+1>:     mov    ebp,esp
0x08049179 <+3>:     push   ebx
0x0804917a <+4>:     sub    esp,0x1f4

This section is the prologue of our program. In assembly language, a prologue is a sequence of instructions that is executed at the beginning of a function to set up a new stack frame.

The function starts with a push instruction, which pushes the value of the base pointer onto the stack. The base pointer, ebp, is a register that points to the base of the current stack frame.

This is followed by a mov ebp, esp instruction, which copies the current value of the stack pointer (esp) into the base pointer (ebp). The stack pointer is used to point to the currently used stack, since most programs will have multiple stack (for each function).

The next instruction, push, saves the value of the ebx register on the stack. The ebx register is a general-purpose register that is often used to hold intermediate results and to pass function arguments.

The instruction sub subtracts a value from the stack pointer. This effectively reserves 0x1f4 (500 in decimal) bytes of space on the stack for local variables and function arguments. This corresponds to the space reserve for our buffer variable (char buffer [500];)

Remember, the stack grows from higher memory address to lower memory address, hence the subtraction operation.

Command line argument handling

1	0x08049180 <+10>: call 0x80491ae <__x86.get_pc_thunk.ax>

The instruction call 0x80491ae transfers control to the function at the specified address. This function is likely a helper function that returns the address of the instruction following the call in the eax register.

0x08049185 <+15>:    add    eax,0x2053
0x0804918a <+20>:    mov    edx,DWORD PTR [ebp+0xc]
0x0804918d <+23>:    add    edx,0x4
0x08049190 <+26>:    mov    edx,DWORD PTR [edx]
0x08049192 <+28>:    push   edx
0x08049193 <+29>:    lea    edx,[ebp-0x1f8]
0x08049199 <+35>:    push   edx
0x0804919a <+36>:    mov    ebx,eax
0x0804919c <+38>:    call   0x8049050 <strcpy@plt>

This set of instructions prepares and handles the command line argument to be copied into our buffer variable.

The instruction add adds the value 0x2053 (8275 in decimal) to the eax register. This adjusts the return value of the helper function to the address of the instruction following the add instruction.

mov edx,[ebp+0xc], loads a value from memory into the edx register. The value is stored at an offset of 0xc (12 in decimal) bytes from the base pointer.

add edx,0x4 adds a value to the edx register. This adjusts the value in the edx register to point to the next memory location.

mov edx, [edx] loads a value from the memory location pointed to by the edx register into the edx register. This is where our command line argument is being stored (argv[1]).

push edx pushes the value of the edx register onto the stack. Our argv[1] variable is now stored in the stack (this is where the buffer overflow will occur).

lea edx, [ebp-0x1f8] calculates the address of the operand [ebp-0x1f8] and stores the result in the edx register. This operand represents the memory location 0x1f8 (504 in decimal) bytes below the base pointer. edx should now hold the last address of the stack.

push edx pushes the value of the edx register (the calculated address) onto the stack.

mov ebx, eax moves the value in the eax register into the ebx register.

call 0x8049050 transfers control to the function at the specified address. This function is the strcpy function, which copies a null-terminated string from one location to another. This is where our command line argument (argv[1] gets copied into the buffer[500] variable.

Main function exit

1	0x080491a1 <+43>: add esp,0x8

The instruction add adjusts the stack pointer by adding 8 to it. This pops the two values that were pushed onto the stack in the previous instructions.

1	0x080491a4 <+46>: mov eax,0x0

The instruction mov copies the value 0x0 (0 in decimal) into the %eax register. This value is the return value of the main function, which is set to zero.

Epilogue

1
2
3

0x080491a9 <+51>:    mov    ebx,DWORD PTR [ebp-0x4]
0x080491ac <+54>:    leave
0x080491ad <+55>:    ret

This section is the epilogue of our program. In assembly language sequence an epilogue is a sequence of instructions executed at the end of a function to restore the state of the previous stack frame and return control to the calling function.

The instruction mov loads a value from memory into the ebx register. The value is stored at an offset of -0x4 (-4 in decimal) bytes from the base pointer. This restores the original value of the ebx register that was saved on the stack earlier in the function.

leave restores the stack frame by copying the value of the base pointer (saved on the stack) into the stack pointer and then popping the value of the base pointer off the stack. This effectively undoes the setup of the stack frame at the beginning of the function.

The instruction ret returns control to the caller of the function by popping the return address off the stack and transferring control to that address.

Alright, that was quite a journey!… Well done if you’ve managed to stay with us thus far! Don’t worry if you haven’t understood every detail yet. In our next post we will use the knowledge learned from the disassembly to exploit the buffer overflow. As you progress, you might find it easier to comprehend the process and can always revisit this post to review the functionality of a particular assembly line.

Now it’s time for the fun part: exploitation, see you there! Buffer Overflow: Command Execution By Shellcode Injection.

References

If you want to go deeper on learning Assembly language: