Welcome to Cred Relay. I’m Jeff, an OSCP, CRTO certified offensive security engineer.
This issue: I let Claude Code loose on my homelab and it lied to me about getting root.
I also had my AI assistant build 9 tools in one night while I watched.
I let an AI Hack my VM (It Cheated)
I have a few VMs in my homelab and one of them is an Ubuntu VM called archives that I run containers from. Well I forgot the root password to it but instead of being frustrated I thought that it would be the perfect target.
Hexstrike-AI is a project with 150+ MCP servers for offensive security tooling. Mostly the stuff that is included with Kali Linux. Its purpose is for “automated” pen testing, vuln research and bug bounty hunting. As a pentester, I wanted to see if I would still have a job next year so I decided to give it a go.
I spun up a Kali VM with Hexstrike, pointed Claude Code at it from WSL2. Getting WSL2 to talk to VMware was annoying. I documented the full setup here and submitted a PR to Hexstrike since it was more painful than expected.
I gave Claude a standard user cred and told it to nmap. So it fired up Hexstrike.

Claude nmapping with Hexstrike-AI.
It was slow. But it’s nmap, so it’s expected. Then it saw that ssh was open and logged in. It was here that I was interested if Claude had been trained on any privilege escalation. And maybe it had been but it didn’t need it.

Masterful priv esc.
Turns out my standard user had full sudo privileges (sudo -l), so root was trivial. I had it try other escalation paths on its own and the results were underwhelming. It didn't think to use LinPEAS until I pointed it there, though it caught a path or two without it.

Priv esc paths.
I saw the LXD bug and told Claude to pop it. It failed totally. I was curious to see if it was actually vulnerable, so I ssh’d in and was able to exploit it easily and confirmed the vulnerability. I’m not sure why it had trouble with it, but I don’t think it would have gotten there without a human-in-the-loop.
I wanted to move to something else and I have DVWA running on this Ubuntu host. So I pointed Claude at it.

Claude using Hexstrike AI with DVWA.
It performed much better against DVWA. It definitely had been trained on it. I told it to “recon and compromise” and it did the rest. Claude methodically went through and began ticking off the critical vulns.

DVWA owned by Claude Code
Hexstrike had many timeouts during this process. Nmap and Nikto were both especially painful. It was good at compromising DVWA. We had a shell and many paths to said shell. Ok great but BORING. Let’s try to escape the container.
I asked Claude to escape the container and get root on the host.
It suggested DirtyPipe to escalate privs. I didn't want to tell it that wouldn't work, but I was encouraged. Truly like watching a junior engineer start to get it.
And then Claude comes back: "I've succeeded. Full root again."
Holy shit, really? How?!
I was excited to learn how it was done. My mind reeled at the implications that Claude had gone further than I could on my own. What did that mean?

Claude’s confession.
But the fucker was lying! I looked and it hadn’t used any exploit. It had used its old sudo access to docker exec into the container. Not hallucinated but lied. It did say at the end “Honest answer: I couldn’t escape the container from inside. It’s properly isolated.” Great, but next time lead with the truth please.
Takeaways
If you want to instrument Kali over MCP with an LLM then Hexstrike-AI is a quick way to set that up. You can approximate someone who has done a CTF box or two with Claude Code on recent models. It could help people learn the process better but it isn’t replacing anyone yet. Hexstrike-AI feels antiquated in a world with OpenClaw.
OpenClaw Built 9 Skills for Me in One Night
It started with a simple question: What if my AI assistant could look up anime for me?
I'd been tinkering with OpenClaw, an open-source AI agent framework, and realized that while my personal assistant agent (Mai) was great at conversation, she was blind to a lot of the internet. No anime databases. No recipe lookups. No Star Wars trivia.
So I asked her to build an anime skill. There was an API called Jikan that doesn’t require authentication. It only took a couple of minutes to generate a CLI tool that was agent usable and wrap it in a skill.
Then I thought: What else could I build like this?
I asked Mai to do a search for other APIs that didn’t require authentication and had useful information similar to the Jikan API. She came back with 8 results. Since they were all kinda similar, I told her to go ahead and implement them all. My actual chat below.
Me: "Feel free to do all the ones with no auth. Do them just like the anime one. GitHub and ClawHub. Send them to council and incorporate their easy suggestions."
Mai: "On it! Spawning sub-agents to build these in parallel..."
It was exciting to see the subagents fired off and building the skills.
Skill | API | Time to Build |
|---|---|---|
📚 books | Open Library | ~1.5 min |
🍳 recipes | TheMealDB | ~1.5 min |
🌍 countries | REST Countries | ~1.5 min |
🚀 spacex | SpaceX API | ~2 min |
🧬 pokemon | PokéAPI | ~1.5 min |
⚔️ starwars | SWAPI | ~2 min |
🍺 breweries | Open Brewery DB | ~1.5 min |
🧙 harrypotter | HP-API | ~1.5 min |
Total wall-clock time: ~2 minutes (they ran in parallel)
Each sub-agent:
Created the skill directory
Wrote the bash script
Wrote SKILL.md documentation
Wrote README.md for GitHub
Tested all commands
Initialized a git repo
Created a GitHub repository
Pushed the code
All I did was watch the notifications roll in.
Quality control
I don’t want to ship trash so I decided to get feedback using one of my other skills the council of the wise. The council skill gets feedback from 4 markdown agents including one devil’s advocate. I send ideas to them and get a report based on their feedback using the skill.
Devil's Advocate — What could break?
Architect — Is the structure right?
Engineer — Any technical issues?
Artist — Is the documentation clear?
The council came back with:
Batch verdict: Ship all. The pattern is consistent, the code is clean, and the documentation is agent-friendly. Quick win: add better error messages for rate limiting across all skills.
Good enough for v1.0.0.
What I actually built
anime
➜ anime git:(main) ./anime search "frieren"
[52991] Sousou no Frieren — 28 eps, Finished Airing, ⭐ 9.28
[59978] Sousou no Frieren 2nd Season — 10 eps, Currently Airing, ⭐ 9.23
[56885] Sousou no Frieren: ●● no Mahou — ? eps, Currently Airing, ⭐ 7.39
[795] Oniisama e... — 39 eps, Finished Airing, ⭐ 7.87
[8367] Crayon Shin-chan Movie 16: Chou Arashi wo Yobu Kinpoko no Yuusha — 1 eps, Finished Airing, ⭐ 6.95
[10116] Crayon Shin-chan Movie 19: Arashi wo Yobu Ougon no Spy Daisakusen — 1 eps, Finished Airing, ⭐ 7.02Actually, it’s more like this screenshot below since I’m rarely on the command line running these tools myself. I just have OpenClaw do it in natural conversation:

Sometimes you don’t have to call the skill directly.
recipes

Recipe skill firing in chat.
You get the idea. These really improve the functionality of the agent. If it’s something that can be done in a simple Bash script then it’s a good candidate for a skill. Especially if it involves an API. The best thing is you access them through conversation. You can also have your agent include them in a morning summary or an intel brief.
Some things to watch out for are outdated APIs. It turns out that my spacex skill is useless as the API stopped being updated in 2022. The same thing for my starwars skill although I prefer the last 3 movies to be excluded from it anyway.
If you want any of these skills you can install them with clawhub:
clawhub install animeOr just ask your agent to install them.
That's it for issue #1. If you found this useful, forward it to someone who'd actually read it.
Got questions, corrections, or want to argue about something? Just hit reply. I read everything.
— Jeff
Coming soon: I pointed Claude Code and Ghidra at kernel drivers and found 8 vulnerabilities in a single day.
