It started with a question
We saw two things that got us excited. First, Alexia Jolicoeur-Martineau showed in her paper "Less is More: Recursive Reasoning with Tiny Networks" that a single tiny recursive model with just 7M parameters could beat large language models on reasoning tasks. The key insight: you don't need billions of parameters if you can iterate. A small network, applied repeatedly, can build up complex computation step by step.
Second, we read a blog post from Percepta AI called "Can LLMs Be Computers?" where they showed that transformers can be trained to execute arbitrary C programs for millions of steps. They literally built a computer inside a transformer.
That made us wonder: what if we took a tiny recursive network, trained it to approximate a virtual machine that tracks memory state, and then fine-tuned a large language model alongside it? If we could backpropagate through a small neural network trained to simulate program execution, and attach it to an LLM that understands code semantics, we could create something that doesn't just pattern-match vulnerabilities. It would understand them.
The early experiments: can neural networks execute programs?
Before going big, we had to prove the concept. We built a custom abstract virtual machine with opcodes like MALLOC, FREE, WRITE, READ, CHECK, PUSH, POP, ADD, SUB, and BRANCH. Then we generated 500,000 synthetic abstract programs with perfect ground-truth labels for vulnerability states.
We trained a tiny looped transformer (just 231K parameters) on these programs. The architecture uses shared weights across iterations. Each iteration of the transformer corresponds to one step of program execution. Same weights, applied again and again, like a recursive function.
What we found:
- 98.8% accuracy on hard VM traces with branching, loops, and pointer operations
- 100% accuracy AND 100% adversarial robustness on abstract interpretation programs
- The model learned to correctly execute all opcodes and track program state
The really exciting part: we could probe the hidden states at each loop iteration and literally watch the model tracking which allocations are live or freed, the stack depth, whether an access is in-bounds or out-of-bounds, and the program counter position. Each loop iteration refined the model's understanding, converging toward the correct execution state. This is exactly what abstract interpretation does in formal methods, but learned end-to-end.
What the model learned internally (linear probing results):
That R² of 0.991 on program counter tracking means the model's hidden states almost perfectly encode where execution is in the program. We could literally read the program counter from the neural activations. The increasing Cohen's d across layers shows that the model progressively separates vulnerability-related features, with each loop iteration adding more discriminative power.
We built 17 different variants (looped_1Lx32, looped_2Lx16, abstract_abstract_1Lx16, and more) and evaluated them extensively. The results confirmed our hypothesis: shared-weight iteration IS learned abstract interpretation.
Scaling up: from toy VMs to real vulnerabilities
With the foundation proven, we needed to bridge the gap to real C code. Real vulnerabilities don't come in neat opcode sequences. They're buried in thousands of lines of complex, messy, real-world code.
Our approach has three phases:
Synthetic pre-training
Train the looped transformer block on 500K synthetic abstract programs. This gives the loop block a strong prior for tracking memory safety state. 100% accuracy.
Cached transfer learning
We take Google's Gemma-3-27B model and extract token embeddings from real C code (the R2Vul dataset of real-world CVEs). These embeddings are cached to disk so we don't need to rerun the 27B model during training. A learned token gate selects the top 256 most informative tokens. These get projected down from 5,376 dimensions to 2,048 and fed into our pre-trained looped transformer. Verdict and CWE classification heads sit on top.
End-to-end fine-tuning
Joint optimization using LoRA adapters on the Gemma backbone, aligning the code representations with the vulnerability detection objective.