AIFactory Blog | AIFactory

The week we stopped failing silently

July 13, 2026 · 5 min read

Creator of AIFactory

The most dangerous way for an autonomous coder to fail is quietly. A wrong patch gets caught by tests. A patch that never gets written looks, from a distance, a lot like a build that is still thinking. This cycle we put a number on that distance and then closed it.

Building on the latest models — with Copilot joining the coders

July 7, 2026 · 2 min read

Olaf Krasicki-Freund

Creator of AIFactory

AIFactory builds software autonomously: it takes a plan, writes the code in an isolated git worktree, and opens a pull request. This round of work moved it onto the current models, added a third coding runtime, and connected it to the rest of the family.

The build that runs anywhere: cutting the last node-pin

June 23, 2026 · 5 min read

Olaf Krasicki-Freund

Creator of AIFactory

When a build can only run on one node, you do not have a cluster — you have one machine with extra steps. This is the story of removing the last thing that pinned an AIFactory build to a single node, why it took three small changes instead of one, and exactly where the honest line sits between "shipped" and "proven."

Concurrency, durable job-state, and the road to Job-native execution

June 21, 2026 · 7 min read

Olaf Krasicki-Freund

Creator of AIFactory

This was a concurrency week. AIFactory started it as a single-instance app that ran one build at a time inside its own pod, and ended it with a control plane that can run many builds at once across replicas, plus the plumbing for moving the heavy work out of the web-server pod entirely and into per-task Kubernetes Jobs. The last part is not finished — and the honest version of that is the most useful thing in this post, so it gets its own section near the end.

Per-worker observability: who spent the tokens, and the security pass that came with it

June 13, 2026 · 7 min read

Olaf Krasicki-Freund

Creator of AIFactory

A few weeks ago we taught the build executor to run independent subtasks in parallel, across multiple LLM providers at once. That work was about throughput, and it worked. But it left a blind spot: when a build finished, the cost and the timing came back as a single aggregate. Total tokens. Total dollars. Total wall-clock. Useful, but it couldn't answer the question you actually ask after a parallel run: which worker spent that money, on which provider, in which model, and where did the time go?

This session was about closing that gap. The headline is per-worker observability. The theme is that all of it is additive — the parallel build spine didn't change, we just put instruments on it. The same session also carried a GitHub Actions security pass and a god-file decomposition, because they were sitting in the same backlog and they unblock the work above.

Proof over promises: putting AIFactory on a benchmark

June 12, 2026 · 5 min read

Olaf Krasicki-Freund

Creator of AIFactory

This week I did something uncomfortable: I stopped shipping features and reviewed AIFactory honestly — against its own goals, and against the 2026 field of autonomous coding tools. Not the demo-day version. The version you'd give an investor who's going to check.

The short verdict: the pipeline works, end to end, and on the axis that actually matters in 2026 — governance and verification — it's ahead of the pack. But we had published zero numbers. We built the part of the problem that doesn't commoditize and then never measured it. This post is what we found and what we're doing about it.

AIFactory 3.4: watch the build, message the agent, and a token-saving solo mode

June 2, 2026 · 3 min read

Olaf Krasicki-Freund

Creator of AIFactory

The 3.4 line is about seeing and steering a build while it runs — and trimming the overhead when a job is small. Here's what landed.

We ran the same build through every LLM — what won, what broke, and where local models stand

May 31, 2026 · 6 min read

Olaf Krasicki-Freund

Creator of AIFactory

AIFactory claims to be provider-agnostic: the same build task should run on Claude, Codex, Gemini, GitHub Copilot, or a local Ollama model. A claim like that is worth nothing until you test it — so we did, with the same task, on every provider, and we re-ran the tests ourselves rather than trusting the agent's word.

The short version: every managed provider produced a working, tested feature — and the process surfaced (and fixed) four real bugs. Local models are a different story, and that story is the interesting one.

Why we can't use Cursor at a bank — and what I built instead

May 30, 2026 · 4 min read

Olaf Krasicki-Freund

Creator of AIFactory

A friend who works at a bank told me their security team had just banned every cloud AI coding tool. Not because they're luddites — these are sharp engineers who'd love the productivity. They banned them because they can't send proprietary source code to a third party, and they can't explain to an auditor where a given line of code came from. They wanted AI's help. They weren't allowed to have it.

I kept hearing versions of this. The more I looked, the more I realized it isn't a niche complaint — it's the unspoken default for a huge slice of the industry. So I built something for it, open -sourced it, and this post is about why.