Selected Research

ICLR 2026: The First Tokens Matter: Early Confidence Signals for Evaluating LLM Reasoning (under review)

Examines whether early token-level confidence signals can predict the reasoning quality of large language models in multi-agent debate systems. The findings show that signals from the first few generated tokens are especially informative for estimating reasoning reliability, offering a lightweight way to monitor and evaluate multi-agent LLM performance.

Paper link →

ACL 2026: From Advocacy to Judgment: Training-Free Analytic Essay Scoring with Multi-Agent Debate and Exemplar Retrieval

Presents MADRAG, a training-free analytic essay scoring framework that combines multi-agent debate with retrieval-augmented exemplar grounding. The system improves calibration, reduces middle-score bias, and achieves strong trait-level scoring performance that is competitive with supervised essay scoring approaches.

Paper link →

LAK 2026: Disagreement as Data: Reasoning Trace Analytics in Multi-Agent Systems (LAK 2026)

Proposes a new way to analyze large language model reasoning traces in multi-agent systems by treating disagreement as a meaningful analytic signal. The work shows how semantic similarity between agent reasoning can help detect coding ambiguity, improve qualitative analysis workflows, and strengthen human–AI collaboration in educational research.

Paper link →

NeurIPS 2025: Application of Multi-Agent Systems for Essay Scoring (Oral)

Introduces multi-agent system architectures for essay scoring, comparing approaches such as supervisor, collaboration, and debate. The work highlights how multi-agent debate can improve scoring reliability and move automated essay evaluation closer to human-level agreement while making advanced AI concepts more accessible for educators and assessment professionals.

NeurIPS Oral Presentation →

First Year PhD Poster: MADEST (Multi-Agent Debate Essay Scoring Triangulation)

A novel approach to automated essay scoring using multi-agent systems. By leveraging debate between specialized agents, the framework provides a more reliable and nuanced evaluation, surpassing traditional single-agent methods and bringing scoring closer to human-level agreement.

Zenodo preprint →

ICCKE 2024: Automating Theory of Mind Assessment with a LLaMA-3-Powered Chatbot

Implemented Theory of Mind assessment for individuals with Autism by developing a LLaMA-3 chatbot to administer the Faux Pas Recognition Test. The system utilizes a multi-phase pipeline that combines interactive rehabilitation dialogues with adaptive hints, an automated scoring phase using an LLM-as-judge, and generates clinician-ready reports to enhance social cognition and improve faux pas detection.

View on IEEE Xplore →

CSICC 2025: iTAG: Easy, Rapid, Automatic Intelligent Tagging for Educational Contents

Automatic metadata tagging of educational content to enhance Intelligent Tutoring Systems (ITS). Leveraging NLP, machine learning, and domain-specific rules, it efficiently extracts both general and pedagogical metadata, improving personalized learning recommendations.

View on IEEE Xplore →

AI-powered Digital Framework for Personalized Economical Quality Learning at Scale

AI-powered digital learning framework designed for scalable, personalized, and cost-effective education. Framework integrates AI-based learner modeling, a personalized recommender system, and AI-assisted support for both learners and facilitators, aiming to address challenges in educational access, soft skills development, and large-scale implementation.

arXiv preprint →

Reflective Practice, Journal: Students’ Reflection on Collaborative Learning in Online Reflective Platforms

Explores the impact of collaborative learning (CL) on the learning environment and academic achievement in online reflective platforms (ORPs). The mixed-methods approach reveals that CL enhances student engagement, promotes a supportive learning atmosphere, and improves academic performance, with positive reflections from students on the collaborative and interactive nature of the online learning experience.

Journal page →

IJ Early Years Education, Journal: Challenges & Solutions in Cooperative Learning: Primary Teachers’ Lived Experiences

Identifies challenges in implementing cooperative learning (CL) in Iran's primary schools, including traditional teaching methods, stakeholder readiness, and lack of facilities. Solutions include improving curriculum, fostering positive attitudes, and enhancing teacher training for effective CL implementation.

Journal page →

Ali Keramati

Selected Research