Cognitive Core & Autonomy·May 18, 2026·8 min read

Multi-Candidate Decisions: Don't Let Agents Take the First Idea

A greedy agent that runs its first thought is brittle. Matrix's autonomous planner proposes several candidate actions, scores them, then commits.

By Matrix Team

The fastest way to build a flaky autonomous agent is to let it execute its first idea.

You've seen the pattern. A while loop wraps model.generate(), the model emits a tool call, you dispatch it, you feed the result back, repeat. It's the canonical ReAct shape, and it works right up until the model's first instinct is wrong — which, over a six-step task, it will be at least once. There's no second opinion. The agent commits to a single thought per step and lives with it. One bad turn early and the whole run drifts.

This post is about the alternative Matrix ships for its autonomous agents: a planner that doesn't trust its first idea. It proposes several candidate actions, scores them against the full picture, and only then commits to one. That's the difference between ai agent planning and ai agent guessing.

The problem with single-shot loops

A single-shot ReAct loop has one structural weakness: the proposal step and the commitment step are the same step. The model says "call search_knowledge with this query" and that is the decision. There is no moment where the agent considers an alternative and rejects it.

In an interactive chat that's tolerable — a human is in the loop, watching, ready to correct. But strip the human out and run the agent against a goal for six or eight steps, and every greedy turn compounds. The agent picks the first plausible tool, gets a so-so result, picks the next plausible tool conditioned on the so-so result, and you watch it confidently walk down a path nobody would have chosen on reflection.

The fix isn't a smarter prompt. It's a decision architecture that separates proposing from choosing.

How Matrix decides: propose → evaluate → select

Matrix's autonomous agents run on a CoALA-style cognitive cycle — PERCEIVE → DECIDE → ACT → LEARN — and the interesting part lives in DECIDE. (If you haven't read how interactive and autonomous agents share one cycle, start with One Decision Cycle for Interactive and Autonomous Agents.)

The component that owns DECIDE for autonomous runs is AutonomousDriver. It is a full multi-candidate planner, not a thin wrapper around one generation:

Propose — one structured LLM call proposes K candidate actions. Each candidate is either a tool invocation or a decision to finish.
Evaluate — a second LLM call scores the candidates against the current situation.
Select — the best-scoring candidate is chosen and handed to ACT.

Two model calls per decision instead of one. That's the cost. In exchange, the agent never commits to an action it hasn't weighed against alternatives.

The candidate shape

Every candidate is one of two things — a grounding move (call a tool) or a terminal move (finish the task). In structured form:

// A proposal is K of these. Each is either {tool, args, rationale} ...
{
  "tool": "search_knowledge",
  "args": { "knowledge_key": "policies", "query": "refund window for damaged items" },
  "rationale": "The goal hinges on the refund policy; retrieve it before drafting a reply."
}

// ... or a decision to stop:
{
  "finish": true,
  "message": "Refund eligibility confirmed and summarized; objective met."
}

The rationale field isn't decoration. It forces the proposing call to justify each candidate in writing, which both improves proposal quality and gives the scoring call something concrete to compare. The evaluator isn't ranking opaque tool names — it's ranking arguments.

It reasons over the whole working memory

A candidate is only as good as the context it's proposed against. Matrix doesn't hand the planner a bare goal string. It reasons over the full WorkingMemory projection assembled fresh each cycle:

persona — who the agent is
objective — the goal it was dispatched with
memory — long-term facts recalled once per cycle into a snapshot
action space — the typed set of moves available (REASONING · RETRIEVAL · LEARNING · GROUNDING), so the model knows what kinds of actions it actually has
progress — the observations accumulated from prior steps in this run

That projection is the same one the rest of the cognitive core uses — the PERCEIVE hub assembles it once and the driver reads it. The planner proposes candidates knowing what it already tried and what it learned, not from a cold start. That's why the second-step proposals are usually better than the first-step ones: progress is in the projection.

The full autonomous step

DECIDE is one seam of a five-part loop. Here's the whole thing, lightly paraphrased from the engine (AgentRuntime.runAutonomous):

loop (step ≤ maxSteps):
  wm   = assemble(AUTONOMOUS, goal, recalled-memory, action-space, observations)  # PERCEIVE
  d    = autonomousDriver.decide(wm)                                              # DECIDE  (propose→evaluate→select)
  if d.finish: write EPISODIC result; return COMPLETED
  res  = toolDispatcher.invoke(toolCtx, callbacks, d.tool, d.args)                # ACT
  observations += summarize(d, res)                                              # OBSERVE
  write EPISODIC step                                                            # LEARN
return budget-exhausted  # FAILED

Read it as a sentence: perceive the situation, deliberate over candidates, act on the winner, summarize what happened, write it to memory, repeat — until the planner chooses finish or the step budget runs out. Every cycle appends one entry to the run's stepsJson log and writes one episodic memory row, so the whole trajectory is auditable after the fact. (For why memory writes are a first-class step rather than an afterthought, see CoALA in Production.)

The finish branch matters as much as the tool branch. A multi-candidate planner that can't decide it's done will burn its whole step budget. Making "finish" a first-class candidate — competing on score against the tool options — is what lets the agent stop on time instead of padding.

Tuning K: how many candidates?

The number of candidates per decision is a single config knob:

matrix.runtime.decision-candidates=3   # K, default

K = 3 is the default and a sensible floor — enough that the planner is choosing among real alternatives rather than rubber-stamping one. Raising K widens the search at each step (more diverse proposals to score) at the cost of a larger proposal call. Lowering it toward 1 collapses back toward the greedy single-shot behavior this whole design exists to avoid. The step budget itself is a separate per-task knob (maxSteps, default 8), so you tune breadth (K) and depth (steps) independently.

Why thinking is disabled on the LLM calls

Both the propose and evaluate calls go through VertexTextClient.generateForStructuredOutput — and that method disables the model's thinking budget on purpose. This is not a detail to gloss over.

Gemini 2.5 Flash, run with a naive generate(), spends its response budget on thinking tokens before it ever emits your structured output. You ask for a JSON array of three candidates and the response comes back truncated because the model "thought" its way through the entire output allowance first. generateForStructuredOutput fixes this by bounding maxOutputTokens and setting the thinking budget to zero, so the full response budget goes to the candidates and the score — the thing you actually parse.

The irony is worth stating plainly: the deliberation that makes this planner good lives in the two-call propose-then-score structure, not in the model's internal monologue. We get our "thinking" from the architecture. Letting the model also burn tokens on private thinking just starves the structured output we need. So we turn it off where structure matters and let the multi-candidate loop do the reasoning.

Graceful degradation when there's no model

VertexTextClient is gated behind matrix.gcp.auth.enabled. On a local boot where that bean isn't present, there's no model to propose or score candidates. Rather than hang a background task waiting on a planner that will never answer, the driver finishes immediately. An autonomous run started without deliberation infrastructure terminates cleanly instead of stalling — degradation, not deadlock. It's a small thing that saves a lot of "why is this task stuck" debugging.

Dispatch an autonomous task

You drive the whole loop with one call:

POST /api/orgs/{slug}/tasks
{
  "name": "draft refund reply",
  "agentId": 42,
  "assigneeKind": "AGENT",
  "payload": { "goal": "Confirm refund eligibility and draft a reply", "maxSteps": 6 }
}

The dispatched TaskRun accumulates the step log in stepsJson — one row per propose→evaluate→select→act→observe cycle — and writes episodic memory per step. The run ends COMPLETED when the planner chooses finish, or FAILED when the step budget is exhausted. Autonomous deliberation requires matrix.gcp.auth.enabled=true (the VertexTextClient bean); the full design lives in docs/COGNITIVE_CORE.md under The decision cycle (autonomous).

Takeaway

Greedy agents are brittle because proposing and committing are the same step — there's no moment to reject a bad idea. Matrix splits those two steps apart for autonomous runs: AutonomousDriver proposes K candidates over the full working-memory projection, scores them in a second call, and only then commits the winner. Thinking is disabled on both calls so the structured output survives, the deliberation lives in the architecture, and a missing model degrades to a clean finish instead of a hang. The cost is one extra LLM call per decision. The payoff is an agent that doesn't walk confidently down the first path it sees.

Build an agent that thinks before it acts. Spin up a workspace, set an agent to AUTONOMOUS mode, POST /tasks with a goal, and watch the stepsJson log show you the decision at every step — propose, evaluate, select. Then read the companion deep-dives on the shared decision cycle and CoALA in production to see how it all fits together.

#ai agent planning#decision making#autonomous agents

Build your first agent on Matrix

Spin up a workspace, wire up tools and knowledge, give your agent a voice, and talk to it in real time — no agent code required.

Create a workspace Read more articles