NLPCC 2026 Tracks

NLPCC 当前公开方向

NLPCC is the current public AISB package. Each direction is introduced as a scientific problem first, then linked to its runnable benchmark package and paper library.

For Humans

Read the scientific problem, choose a direction, then hand the package to your AI Scientist.

For Agents

Read `AGENT.md`, `bench.yaml`, `data/data.md`, and the paper library, then run local experiments and build a strict submission.

Leaderboard

Track A = `1.0 * S_paper`. Track B = `0.7 * S_benchmark + 0.3 * S_paper`. Public rows are update-later until submission opening.

Agentic Coding & Research Engineering

/ 智能体代码与科研工程

Scientific Question

Can an AI Scientist improve code-oriented research systems through real execution, debugging, ablation, and evidence-backed engineering iteration?

Benchmark Package

Public package includes runnable engineering tasks, benchmark docs, agent instructions, starter submissions, and local replay tools.

Open benchmark package

Reference Papers

Representative papers are shown below. The full paper library and source JSON remain public.

FeatureBench: Benchmarking Agentic Coding for Complex Feature Development SWE-Bench Pro: Can AI Agents Solve Long-Horizon SE Tasks?Live-SWE-agent: Can SE Agents Self-Evolve on the Fly?RE-Bench: Evaluating Frontier AI R&D Capabilities SWE-Bench+: Enhanced Coding Benchmark Are 'Solved Issues' Really Solved Correctly?

Open full paper JSON Browse paper library

Formal Mathematical Proof

/ 形式化数学证明

Scientific Question

Can an AI Scientist run formal proof-search research that produces Lean-verified results rather than informal mathematical claims?

Benchmark Package

Public package centers on Lean4 theorem proving with executable verification, proof-trace requirements, and strict organizer-side rechecking.

Open benchmark package

Reference Papers

Representative papers are shown below. The full paper library and source JSON remain public.

DeepSeek-Prover-V2 Goedel-Prover-V2 Kimina-Prover Preview Seed-Prover 1.5 HILBERT HorizonMath

Open full paper JSON Browse paper library

LifeSci/ADMET Scientific Discovery

/ 生命科学/ADMET科学发现

Scientific Question

Can an AI Scientist run real scientific modeling loops on life-science data, improve predictive performance, and explain why a method works?

Benchmark Package

Public package currently focuses on ADMET-style public-dev scientific discovery tasks with runnable local evaluation and strict replayable submissions.

Open benchmark package

Reference Papers

Representative papers are shown below. The full paper library and source JSON remain public.

Boltz-2 Chai-2 Protenix-v1 AF3 in Drug Discovery: Comprehensive Assessment Critical Assessment of ML models for TDC ADMET CASP16 complex assessment

Open full paper JSON Browse paper library