Why Auto-Healing Needs a Budget

The Yeti Build Agent in SnowCoder generates ServiceNow artifacts using the Fluent SDK, runs validation against your instance, and retries on failure. That last part is the problem if you do not control it. A single broken acceptance criterion can spiral into ten retries, twenty model calls, and a token bill nobody approved.

HealBudget exists to make that scenario impossible. It caps cost per build before any retry exhausts the wallet, and it does so at the orchestration layer, not the retry layer. The difference matters: a HealBudget check happens before the Yeti Build Agent decides whether the next attempt is worth running.

The 291-story build benchmark that ships with SnowCoder is the test case. Every story is generated, validated, and healed under the same budget envelope. If a story cannot heal within budget, it fails fast and surfaces a clear reason. No runaway spend, no silent retries.

Anatomy of a Healing Loop

A SnowCoder build cycle for a single story looks roughly like this. The Yeti Build Agent emits Fluent SDK code for one of the 42 supported artifact classes, deploys it to a target scope, runs acceptance checks, and either signs the work off or feeds the failures back to a heal step.

// Pseudocode for a single story build cycle
async function buildStory(story, budget) {
  let attempt = 0;
  let spend = 0;

  while (attempt < 5) {
    const result = await generateAndDeploy(story);
    spend += result.cost;

    if (result.passed) return { status: 'PASSED', spend, attempt };

    // HealBudget check happens BEFORE the next attempt
    if (spend + estimateNextHeal(result) > budget.cap) {
      return { status: 'BUDGET_EXCEEDED', spend, attempt };
    }

    const heal = await healAgainstDiffs(result.failedAcs);
    spend += heal.cost;
    attempt++;
  }

  return { status: 'MAX_RETRIES', spend, attempt };
}

Note where the budget gate sits. It is not at the end of the loop, after the retry runs. It is at the top, before each attempt, and it includes a forecast of what the next heal would cost. This forecast is the part that keeps spend bounded under load.

The 5-Attempt Ceiling

Auto-heal retries up to 5 attempts within HealBudget. That number is deliberate. In benchmark data across 120+ ServiceNow scenarios, the vast majority of stories that pass eventually pass within the first three attempts. The fourth and fifth exist to absorb instance-side flakiness like transient ACL evaluation failures or update set commit conflicts, not fundamental code defects.

The interaction between the attempt counter and the budget cap is what makes the system safe:

Whichever limit is hit first wins. A story that exhausts budget on attempt 2 fails before attempt 3 runs.
The 5-attempt ceiling is a hard stop. Even with budget remaining, the Yeti Build Agent will not enter a 6th attempt.
Failed stories do not poison the queue. A budget-exceeded story is marked, surfaced in the run summary, and the next story starts with a fresh budget envelope.

What HealBudget Counts

HealBudget tracks the full cost of a build attempt, not just the model call. That includes:

Generation tokens. Every prompt and completion involved in producing the Fluent SDK code.
Validation overhead. Knowledge base retrieval against the 100,000+ vector ServiceNow corpus and the 17,000+ code examples.
Heal cycle costs. Each per-AC diff round-trip, including the model reasoning over the failure trace.
Instance-side checks. Any test scripts, ATF flows, or live record probes triggered by the validator.

Treating these as a single envelope keeps the budget meaningful. A build that "passed" only because it skipped validation does not pass under HealBudget accounting.

Configuring HealBudget for a Project

HealBudget exposes three knobs at the project level. Most teams change only one of them after the first run.

{
  "project": "hr-service-portal",
  "healBudget": {
    "cap": 2.50,              // hard ceiling per story, in USD
    "softWarnAt": 0.75,        // 75% triggers a non-blocking warning
    "maxAttempts": 5,          // ceiling on heal retries, max 5
    "rollupWindow": "story"    // alternative: "build" for shared envelope
  }
}

The cap is per story by default. Setting rollupWindow to build shares the envelope across the entire run. That is useful when you want to greenlight a large generation pass but cap total spend, accepting that early stories may eat more of the budget than late ones.

softWarnAt is purely informational. It logs a warning to the Yeti Build Agent dashboard so engineers can spot stories that are consistently expensive to heal, often a sign of an underspecified acceptance criterion rather than a model failure.

How HealBudget Interacts with the 291-Story Benchmark

The 291-story benchmark is the internal stress test for the Yeti Build Agent. Every release runs the full corpus end to end with HealBudget enabled. Two things matter in the result set:

Pass rate. Stories that complete inside budget and attempt count.
Cost variance. The distribution of spend per story across the corpus, watched for regressions release over release.

When a model upgrade lands, the benchmark is the first thing that catches drift. If a new model burns more tokens per heal cycle, HealBudget exceedances surface immediately. That signal is more reliable than aggregate accuracy because it shows up before the pass rate drops.

Across the same corpus, SnowCoder remains 60% more accurate than generic ChatGPT or Claude across 120+ ServiceNow benchmarks. HealBudget is the cost-side guardrail that lets the team push accuracy forward without sacrificing the wallet.

When a Story Hits the Cap

A BUDGET_EXCEEDED outcome is not a failure of the Yeti Build Agent. It is a deliberate signal. The story summary records:

Total spend at the point of stop.
Number of attempts completed.
The last failing acceptance criterion and its diff.
An estimate of additional budget that the next heal cycle would have required.

That last item is the most useful one. It lets a developer decide quickly whether to lift the cap, split the story, or fix the underlying AC. In most cases the right answer is to fix the AC: stories that consistently exceed budget tend to have ambiguous requirements that the model keeps re-interpreting on each retry.

Where HealBudget Fits in the SnowCoder Tier Model

The Yeti Build Agent is an Enterprise-tier capability, which means HealBudget ships everywhere the Yeti Build Agent does. There is no separate add-on, no extra license, and no "premium budget control" SKU. The intent is simple: if you are running auto-healing builds at all, you should have a spend ceiling on them by default.

MCP access is on every tier, which is worth noting because it means smaller teams can still drive the Yeti Build Agent from Claude Code or Cursor through SnowCoder MCP. HealBudget applies the same way regardless of how the build is triggered.

Takeaway

Auto-healing is only useful when it is bounded. HealBudget bounds it. It caps cost per build before any retry exhausts the wallet, holds the line at 5 heal attempts, and produces a clear signal when a story is not worth healing further. The 291-story benchmark is the proof point, and the Enterprise tier is where you put it to work.

If you want to see the budget gate in action, start the Yeti Build Agent on a sample backlog and watch the run telemetry. Stories that hit the cap will tell you more about your requirements than your model.

How HealBudget Caps Spend on Auto-Healing ServiceNow Builds