So a low-skill attacker pointed Claude Code and Codex at targets and basically let the AI hack for them. Researchers recovered over 1,000 agent sessions and found how easily he bypassed most guardrails. The trick? Just say its „red team research“. Guardrails that fold that easy arent really guardrails, and this only gets worse once its fully automated.
iamapizza on
Looking at this collection of prompts they uncovered, that’s not a low skilled attacker. Low skilled attackers don’t use Kali, for starters.
So you’re saying a company was so lazy they didn’t even bother to ask chatgpt to redteam their website, then got hacked? Curious.
CymonSet on
Smart enough to be dangerous, not smart enough to understand the guardrails and when to enforce them. Sure, lets pause progress here. At least for us; because bad actors are not going to observe the pause. Our access to tools to fix vulnerabilities will fall further behind the ability of bad actors to exploit the vulnerability and create new ones.
Tetrite1955 on
„The attacker’s inexperience was also evident in his operational security failures. At one point he asked Claude to help edit his resume, which contained his full name, location, education history, and LinkedIn profile.“
Kek
marsshadows on
I’m still waiting for the day when these advanced llms drastically reduce the extreme hardware spec requirements in which they run on and leave pc and console consumers paying heavy price because these llms hardware needs.
OneArmedZen on
It’s just the modern day equivalent of *skids* (script kiddies)
DarkFantom on
Lmao this dude got caught because he was using one of his compromised Claude instances to work on his resume 😂 Gotta be one of the greatest fumbles of all time hahaha
IllIIllIllIIIlllll on
„Up next at 11, AI safeguards? Not so fast! High-powered artificial intelligence used by low-functioning natural intelligence to hack into dozens of corporations.“
unwarrend on
What a rude title: Unlettered muffin-top manages to do something useful with AI – news at 11.
SHORT_INFO_NEWS on
The detail that stuck out in the OALABS (Open Analysis) writeup behind this: across those 1,000-plus sessions, Claude Code logged only nine policy refusals and Codex just one. So it wasn’t a clever jailbreak, the „authorized red team“ framing is the exact wording real pentesters use, so the models had no clean way to tell the two apart. The logs cover at least 14 breached firms but contain nothing showing the data was ever sold or turned into money. The operator’s tradecraft was rough too: he had the agent help rewrite his resume with his real name and LinkedIn, and at one point exposed his home IP to it.
pinkfootthegoose on
Imagine the lack of skill and security in those 14 companies that couldn’t keep out a low skill attack.
Leave A Reply
Du musst angemeldet sein, um einen Kommentar abzugeben.
12 Kommentare
So a low-skill attacker pointed Claude Code and Codex at targets and basically let the AI hack for them. Researchers recovered over 1,000 agent sessions and found how easily he bypassed most guardrails. The trick? Just say its „red team research“. Guardrails that fold that easy arent really guardrails, and this only gets worse once its fully automated.
Looking at this collection of prompts they uncovered, that’s not a low skilled attacker. Low skilled attackers don’t use Kali, for starters.
https://research.openanalysis.net/claude/codex/hacking/ai%20hacking/llm/redteam/policy%20violation/2026/06/16/compromised-claude-hacking.html#Appendix-A—Post-Compromise-Timeline
So you’re saying a company was so lazy they didn’t even bother to ask chatgpt to redteam their website, then got hacked? Curious.
Smart enough to be dangerous, not smart enough to understand the guardrails and when to enforce them. Sure, lets pause progress here. At least for us; because bad actors are not going to observe the pause. Our access to tools to fix vulnerabilities will fall further behind the ability of bad actors to exploit the vulnerability and create new ones.
„The attacker’s inexperience was also evident in his operational security failures. At one point he asked Claude to help edit his resume, which contained his full name, location, education history, and LinkedIn profile.“
Kek
I’m still waiting for the day when these advanced llms drastically reduce the extreme hardware spec requirements in which they run on and leave pc and console consumers paying heavy price because these llms hardware needs.
It’s just the modern day equivalent of *skids* (script kiddies)
Lmao this dude got caught because he was using one of his compromised Claude instances to work on his resume 😂 Gotta be one of the greatest fumbles of all time hahaha
„Up next at 11, AI safeguards? Not so fast! High-powered artificial intelligence used by low-functioning natural intelligence to hack into dozens of corporations.“
What a rude title: Unlettered muffin-top manages to do something useful with AI – news at 11.
The detail that stuck out in the OALABS (Open Analysis) writeup behind this: across those 1,000-plus sessions, Claude Code logged only nine policy refusals and Codex just one. So it wasn’t a clever jailbreak, the „authorized red team“ framing is the exact wording real pentesters use, so the models had no clean way to tell the two apart. The logs cover at least 14 breached firms but contain nothing showing the data was ever sold or turned into money. The operator’s tradecraft was rough too: he had the agent help rewrite his resume with his real name and LinkedIn, and at one point exposed his home IP to it.
Imagine the lack of skill and security in those 14 companies that couldn’t keep out a low skill attack.