I recently had the privilege of hosting and speaking at HackAIcon, Europe's first conference dedicated to the intersection of AI and cybersecurity, and honestly, it exceeded all expectations.
As a co-founder and CTO of Ethiack, hosting this conference was an incredible experience. Since this was our first time organizing such a big event, I was a bit concerned we couldn’t pull it off, but we did, and we did so with a full house. Bringing together so many exceptional speakers, researchers, and practitioners in one place generated the kind of positive energy and knowledge-sharing that pushes our entire industry forward. The discussions, the debates, the learnings – it was exactly what we hoped for when we set out to create this event. I'm grateful to everyone who attended and contributed to making HackAIcon such a success.
My talk focused on something we've been deeply invested in at Ethiack: building autonomous hackbots that can take penetration testing to the next level. This isn’t something coming out of 2077, but something that exists and works today.
My journey into AI-powered security testing started in 2021-2022 when I first encountered GitHub Copilot. Watching an AI assistant generate, complete, and write code that programmers actually wanted to use triggered a simple realization: if AI can generate code, maybe it can also find vulnerabilities. That spark of curiosity led us down a path that would fundamentally change how we approach offensive security.
To understand where we are today, it helps to trace the evolution of AI itself.
Classical machine learning required labeled datasets and extensive feature engineering – models trained for specific tasks on specific data. Then came deep learning, which automated feature extraction and enabled models to work across multiple domains with less manual intervention. Large Language Models (LLMs) represented another leap, built on the transformer architecture and trained on massive text corpora. These models developed something we perceive as general understanding and reasoning capabilities.
But the real breakthrough came with agentic AI: connecting LLMs with tools and the ability to interact with their environment. These systems can now plan, execute tasks, actively probe systems, adapt strategies based on results, and operate with autonomy. What does this mean for ethical hacking?
[image]
Practical AI Applications in Hacking
Before diving into fully autonomous systems, let me share some practical ways AI already improves security research:
But even with all these improvements, we still have repetitive work with current scanning tools. That's where fully autonomous hackbots come in.
Can AI agents hack like humans, or at least closer to humans? We believe the answer is yes, and here's how we're doing it:
Here's the critical part: AI agents that interact directly with systems can cause serious damage if not properly controlled. We've all seen reports of AI coding assistants accidentally deleting project folders or local databases. In penetration testing, the stakes are even higher.
We implemented three layers of guardrails that reduce the probability of destructive actions to nearly zero:
In early testing, our team had to babysit the hackbot 24/7. Now, with these guardrails in place, we can let it run autonomously and review traces after execution.
During my presentation, we had the first-ever show and tell from a hackbot. We generated a video from a trace where Hackian, our autonomous agent, walked through two critical vulnerabilities it discovered during the DEFCON Bug Bounty Village CTF.
https://x.com/ethiack/status/1974165048810779131
Vulnerability 1 - Command Injection: Hackian discovered a debug endpoint executing ps aux commands. Through systematic fuzzing, it found the magic parameter and injection technique to break out of a grep context and achieve remote code execution.
Vulnerability 2 - Arbitrary File Read: The analyzed endpoint was using Clojure's slurp function to read file paths directly from the HTTP request body without validation. This was completely unintended by the organizers, so Hackian found a bug they didn't know existed.
Here's what surprised me: when I participated in the same DEFCON challenge before deploying Hackian, I found completely different vulnerabilities. The RCE that Hackian discovered wasn't found by any human participant. Meanwhile, I found more flags than Hackian, but they were entirely different vulnerability classes.
This proves what we believe at Ethiack: humans and automation aren't competitors. They're complementary. While the design of Hackian intends to mimic a human hacker, we keep working with our force of Ethical Hackers to deliver depth and natural creativity to our pentesting efforts.
Our results speak to the impact and viability of this approach:
As attack surfaces grow exponentially, driven by AI-accelerated development and increasing technical debt, security must scale accordingly. That applies to both defensive and offensive operations.
The future of security isn't about replacing human hackers with AI. It’s about having AI responding quickly to attack vectors and conducting continuous testing, while humans do a deeper research. Criminals keep improving their methods, and we must follow. The battle for Internet security and privacy won't be won by hoping for the best. It'll be won by sharing what we know, building smarter tools, and fighting every day for an Internet that's actually free and secure. Not just for some. For everyone.
We're still early in this journey, but the possibilities are exciting. If you're interested in learning more about autonomous security testing or meeting Hackian, drop me a message.
Until next time,
André Baptista (0xacb)