ICLR 2026: The First Tokens Matter: Early Confidence Signals for Evaluating LLM Reasoning (under review)
Examines whether early token-level confidence signals can predict the reasoning quality of large language models in multi-agent debate systems. The findings show that signals from the first few generated tokens are especially informative for estimating reasoning reliability, offering a lightweight way to monitor and evaluate multi-agent LLM performance.
