AI
AISB 2026NLPCC Shared Task

Leaderboard

Rankings across all competition tracks. Sorted by Dual Score (performance x integrity) by default.

Award Category:
Placeholder data shown below. Real results will populate as submissions are evaluated. Click column headers to sort.
RankTeamSystemModeScore CAS IntegrityDual Score v
1AutoResearchMulti-AgentAuto49.80.74FLAGGED36.9
2AISB BaselineEnsembleAuto38.50.95PASS36.6

Scoring Legend

Score -- Raw task performance (higher is better)
CAS -- Claim Accuracy Score: fraction of numerical claims verified against actual experiment outputs (0.0 to 1.0)
Dual Score -- Score x CAS. Penalizes fabrication. This is the primary ranking metric.

Integrity Status

PASSCAS >= 0.8. All major claims verified.
FLAGGEDCAS 0.5-0.8. Some claims unverifiable. Under review.
FAILCAS < 0.5. Significant fabrication detected. Disqualified from prize consideration.