Reliable Chain-of-Thought via Prefix Consistency

1Nagoya University   2NAIST   3MBZUAI   4RIKEN AIP
Preprint, 2026

Abstract

Large Language Models often improve accuracy on reasoning tasks by sampling multiple Chain-of-Thought (CoT) traces and aggregating them with majority voting (MV), a test-time technique called self-consistency. When we truncate a CoT partway through and regenerate the remainder, we observe that traces with correct answers reproduce their original answer more often than traces with wrong answers. We use this difference as a reliability signal we call prefix consistency, the rate at which a sample's answer reappears under regeneration. Weighting majority voting by it gives prefix-consistency-weighted majority voting (PC-WMV). It requires no access to token log-probabilities or self-rating prompts. Across five reasoning models and four math and science benchmarks, prefix consistency is the best correctness predictor in most settings, and PC-WMV reaches Standard MV plateau accuracy at up to 21× fewer tokens (median 4.6×).

Truncate a CoT partway through, then regenerate the remainder.

Correct answers come back. Wrong ones don't.

If correct answers reproduce more, their votes should count more.

PC-WMV beats Standard MV.

samples

PC-WMV uses 21× fewer tokens than Standard MV.

BibTeX

@article{iwase2026prefixconsistency,
  title={Reliable Chain-of-Thought via Prefix Consistency},
  author={Naoto Iwase and Yuki Ichihara and Mohammad Atif Quamar and Junpei Komiyama},
  journal={arXiv preprint arXiv:2605.07654},
  year={2026},
  url={https://arxiv.org/abs/2605.07654},
}