Create a Video View Paper

Probabilistic Tiny Recursive Model: Test-Time Compute Scaling for Iterative Reasoning

This presentation explores how the Probabilistic Tiny Recursive Model (PTRM) unlocks dramatic accuracy gains in compact reasoning models through test-time compute scaling. By introducing stochastic parallel exploration at inference, PTRM transforms deterministic recursive models into systems that can escape local solution traps and achieve near-doubling of accuracy without retraining. The talk demonstrates how width scaling and intelligent solution verification enable tiny models to outperform ensembles of large language models at a fraction of the computational cost.

Script

Deterministic reasoning models get stuck in bad solutions like a ball trapped in a valley. The Probabilistic Tiny Recursive Model escapes by running many noisy parallel searches and picking the best answer, achieving accuracy gains that retraining alone cannot reach.

Tiny Recursive Models use compact architectures with iterative refinement, but their deterministic paths often converge to incorrect local solutions. The authors introduce PTRM as a pure inference-time extension that requires no retraining, just parallel rollouts with Gaussian noise injected at every latent step.

When the researchers analyzed TRM failures, they discovered three trajectory classes in latent space: quick successes, delayed successes that escape bad basins late, and failures that remain trapped. The Q head trained for early stopping turns out to strongly predict which trajectories will succeed.

Stochastic rollouts reveal that correct solutions exist in distant latent regions unreachable by deterministic inference. Even when standard TRM consistently fails on hard instances, a small fraction of noisy rollouts escape the trap and find correct answers, and as rollout width increases from 1 to 100, accuracy climbs monotonically.

On the Pencil Puzzle Bench, PTRM raises aggregate accuracy from 62.6 percent to 91.2 percent, outperforming the best single large language model by 56 percentage points and a stacked ensemble of 7 state-of-the-art models by 36 points, all while consuming less than one ten-thousandth of the inference cost.

PTRM demonstrates that test-time compute scaling through width can unlock accuracy previously reserved for far larger models, turning compact recursive architectures into practical alternatives in resource-constrained settings. Explore how probabilistic reasoning and compute tradeoffs are reshaping model design at emergentmind.com, where you can create your own videos on the latest research.