
The Brain Behind Next-Generation Cyber Attacks

Introduction
Last week, researchers at Carnegie Mellon University (CMU) revealed a finding that caught the attention of both the AI and cybersecurity worlds. Their work tackled a lingering challenge: whether today’s leading large language models (LLMs) can independently carry out complex, multi-host cyber-attacks from start to finish. In their raw form, when asked to execute multi-step cyber-attacks from start to finish, these models routinely fail. They wander off-task, choose the wrong tools, or supply flawed parameters that derail the operation.
Their breakthrough came with the introduction of Incalmo, a structured abstraction layer that narrows planning to a precise set of attack actions and augments the model with memory and path-awareness. Instead of letting the model both plan and execute, Incalmo serves as the backbone and brain while LLMs act as action-specific agents. This concept matters because it addresses a fundamental weakness in current AI models, mirrors patterns already present in the deep and dark web, and aligns with the philosophy of a central “brain” orchestrating specialized security agents.
Abstract Is the New Black
Incalmo is an architecture. At its core, it cleanly separates planning from execution. Rather than letting an LLM wrestle with the vast space of raw commands in a complex, multi-host cyber-attack, it constrains the model to a defined vocabulary of high-level attack verbs such as scan network, move laterally, gain persistence, and exfiltrate data.

Figure 1: Incalmo’s architecture as a high-level attack abstraction layer for LLMs. Instead of LLM interaction with low-level shell tools, LLMs specify high-level actions.
(Singer, B., Lucas, K., Adiga, L., Jain, M., Bauer, L., & Sekar, V. (2025). On the feasibility of using LLMs to execute multistage network attacks (arXiv:2501.16466v3)).
When the model chooses a verb, a translation layer maps it to the right low-level implementation, invoking the best-fit agent such as a port scanner, credential dumper, or lateral movement toolkit. Around that sits environment state, a live memory of the attack’s progress, and an attack graph service, a map of possible next moves. This gives the AI a coherent and up-to-date picture of its world.
Without scaffolding, LLMs drift. They attempt a relevant step but misconfigure syntax, pick an ill-suited tool, or repeat unproductive actions. Incalmo fixes this by shrinking the decision space, embedding context into every choice, and decoupling “what to do” from “how to do it.” That means you can swap or upgrade agents without retraining the planner. The researchers showed this allowed smaller models to succeed where larger, unstructured ones failed, highlighting that design and orchestration can matter more than raw model power.

Figure 2: Maximum percentage of successful attacks reached by LLMs with and without an abstraction layer; Across all environments, LLMs were able to reach more relevant states with an abstraction layer than without.
(Singer, B., Lucas, K., Adiga, L., Jain, M., Bauer, L., & Sekar, V. (2025). On the feasibility of using LLMs to execute multistage network attacks (arXiv:2501.16466v3)).
A Breakthrough to Academia is a Regular Monday in the Underground
Even though Incalmo is new to scholarly literature, the pattern of abstraction, orchestration, and stateful coordination has been part of the criminal playbook for years. The deep and dark web have been a proving ground for AI-driven attack orchestration long before “agentic AI” entered mainstream security vocabulary.
Early malicious AI tools were basic: cracked chatbots with minimal data and poorly tuned open-source models. They could generate phishing content or obfuscated scripts but lacked reliability for complex operations. The shift came when underground developers began selling wrapper services instead of standalone models. WormGPT, FraudGPT, and DarkestGPT are examples. Most of these are thin orchestration layers wrapped around existing LLMs. The product is the interface, such as a Telegram bot, web panel, or API, that accepts natural-language attack goals and turns them into actionable outputs.

Figure 3: In-the-Wild version of Darkweb WormGPT model.
For example, a user might type “Write me a macro that downloads and runs X, but obfuscate it to bypass Word security prompts.” The wrapper parses that intent, applies tested jailbreak prompts to the underlying model, runs the generated code through a library of obfuscation templates, and finally packages it into a deliverable payload. The customer sees only a clean, ready-to-use output. The complexity of tool choice, syntax handling, formatting, and retries is hidden within the orchestration layer.
Some services go further by chaining multiple AI and non-AI agents: one for reconnaissance, another for phishing kit customization, a third for infrastructure setup. Each agent handles its domain with expert reliability, and the orchestration layer stitches their outputs into a cohesive campaign. If a step fails, the system retries with adjusted parameters — a primitive but effective feedback loop. This is, in essence, Incalmo in the wild: constrained action space, specialized executors, shared state, and planning at the verb level.

Figure 4: Architecture of a Darkweb-based GPT-enabled attack orchestration, showing how the LLM abstraction layer coordinates the various components of a multi-host attack by delegating tasks to specialized malicious agents.
For threat actors, the incentive is clear: this architecture lowers the skill barrier, accelerates execution, and allows even low-tier actors to run campaigns that once required experienced operators. For defenders however, the uncomfortable truth is that these underground toolchains are not hypothetical. They are advertised openly within closed forums, sold by subscription, and iterated like any SaaS product. And they will only get more capable as their operators borrow directly from academic proofs like Incalmo.
New Implementation to a Well-Established Philosophy
The design described in the research aligns closely with an approach that has guided enterprise defense for years: a central intelligence that understands both intent and state, working in concert with a distributed mesh of specialized enforcement points.
Think of ThreatCloud AI as the planner and brain. It ingests telemetry and threat intelligence, maintains a live model of exposure and activity, and decides what needs to happen next — updating a prevention policy, isolating a user session, blocking a route, or rolling a configuration change. That intent is executed by distributed enforcement points — the agents across network, cloud, endpoint, and identity — each optimized for its domain. The plumbing matters: common verbs for prevention, shared state, and automatic feedback so each decision improves the next.

Figure 5: ThreatCloud AI combines advanced AI/ML threat detection with big data threat intelligence to deliver accurate prevention decisions, seamlessly integrating telemetry with Quantum, CloudGuard, Harmony, Horizon, and ThreatCloud APIs for end-to-end security coverage.
That is the conceptual overlap with Incalmo. Incalmo’s value is the decoupling of high-level actions and memory from domain-specific executors. ThreatCloud AI plays the same role for defense. Quantum, Cloud, Endpoint, Identity, and allied services act as the limbs and organs. Because the brain is separate from the muscles, either side can evolve independently. New prevention engines can be added, detection methods upgraded, or new signals integrated without rewriting how the system thinks. And because the brain has live context, the posture adapts in real time as IoCs land, policies propagate, and controls adjust.
For the Leaders
The CMU research validates an operating model already in use by sophisticated adversaries. In controlled conditions, decoupling planning from execution and giving the planner a limited, meaningful vocabulary plus state awareness transforms AI from a clever but clumsy assistant into a reliable operator.
For Defenders, the Implications Are Clear:
- Assume agentic AI will be part of the threat model against you. The criminal ecosystem is already deploying orchestrated AI services as a standard practice. Incalmo-style architecture will make them faster, cheaper, and more consistent, lowering the bar for less capable actors.
- Mirror or surpass that architecture in defense. A fragmented set of controls without a unifying brain will always lag behind a coordinated attack system. ThreatCloud AI, as the orchestrator, is what keeps Check Point’s prevention model responsive and adaptive across all surfaces.
Attackers have already embraced the principles this research formalizes. Staying secure means matching or exceeding that coordination with a defensive nervous system that can see, decide, and act faster than the threats it faces.