Skip to main content

We ran the same build through every LLM — what won, what broke, and where local models stand

· 6 min read
Olaf Krasicki-Freund
Creator of AIFactory

AIFactory claims to be provider-agnostic: the same build task should run on Claude, Codex, Gemini, GitHub Copilot, or a local Ollama model. A claim like that is worth nothing until you test it — so we did, with the same task, on every provider, and we re-ran the tests ourselves rather than trusting the agent's word.

The short version: every managed provider produced a working, tested feature — and the process surfaced (and fixed) four real bugs. Local models are a different story, and that story is the interesting one.

Why we can't use Cursor at a bank — and what I built instead

· 4 min read
Olaf Krasicki-Freund
Creator of AIFactory

A friend who works at a bank told me their security team had just banned every cloud AI coding tool. Not because they're luddites — these are sharp engineers who'd love the productivity. They banned them because they can't send proprietary source code to a third party, and they can't explain to an auditor where a given line of code came from. They wanted AI's help. They weren't allowed to have it.

I kept hearing versions of this. The more I looked, the more I realized it isn't a niche complaint — it's the unspoken default for a huge slice of the industry. So I built something for it, open -sourced it, and this post is about why.