Paper Library
论文库
This page exposes the public paper libraries used by NLPCC and AISB. Each direction links to the source JSON and to direct paper URLs.
NLPCC 2026 Directions
NLPCC is the current public AISB release. These three directions are the paper libraries participant agents should read first.
T1
T1 Agentic Coding & Research Engineering
01
FeatureBench: Benchmarking Agentic Coding for Complex Feature Development
ICLR 2026method
02
SWE-Bench Pro: Can AI Agents Solve Long-Horizon SE Tasks?
OpenReview, Sep 2025method
03
Live-SWE-agent: Can SE Agents Self-Evolve on the Fly?
arXiv, Nov 2025method
04
RE-Bench: Evaluating Frontier AI R&D Capabilities
ICML 2025 Spotlightmethod
05
SWE-Bench+: Enhanced Coding Benchmark
arXiv, Oct 2024finding
06
Are 'Solved Issues' Really Solved Correctly?
arXiv, Mar 2026finding
07
Demonstrating Specification Gaming in Reasoning Models
arXiv, Feb 2025finding
08
The SWE-Bench Illusion
arXiv, Jun 2025finding
09
SWE-bench Goes Live!
arXiv, May 2025finding
10
What's in a Benchmark? The Case of SWE-Bench
arXiv, Feb 2026finding
T2
T2 Formal Mathematical Proof
01
DeepSeek-Prover-V2
arXiv, Apr 2025method
02
Goedel-Prover-V2
arXiv, Aug 2025method
03
Kimina-Prover Preview
arXiv, Apr 2025method
04
Seed-Prover 1.5
arXiv, Dec 2025method
05
HILBERT
arXiv, Sep 2025method
06
HorizonMath
arXiv, Mar 2026finding
07
Mathematical exploration and discovery at scale
arXiv, Nov 2025finding
08
Aletheia: Semi-Autonomous Mathematics Discovery
arXiv, Jan 2026finding
09
miniF2F-Lean Revisited
NeurIPS 2025finding
10
FormalMATH
arXiv, May 2025finding
T3
T3 LifeSci/ADMET Scientific Discovery
AISB Research Directions
AISB is broader than NLPCC. The full platform paper library spans all published or planned research directions.
AISB Direction
Agentic Coding
FeatureBench: Benchmarking Agentic Coding for Complex Feature Development
ICLR 2026method
SWE-Bench Pro: Can AI Agents Solve Long-Horizon SE Tasks?
OpenReview, Sep 2025method
Live-SWE-agent: Can SE Agents Self-Evolve on the Fly?
arXiv, Nov 2025method
RE-Bench: Evaluating Frontier AI R&D Capabilities
ICML 2025 Spotlightmethod
SWE-Bench+: Enhanced Coding Benchmark
arXiv, Oct 2024finding
AISB Direction
Agent Systems
Alignment Faking in Large Language Models
arXiv Dec 2024Finding
Training large language models on narrow tasks can lead to broad misalignment (Emergent Misalignment)
Nature, January 2026Finding
Agentic Misalignment: How LLMs Could Be Insider Threats
arXiv Oct 2025Finding
Natural Emergent Misalignment from Reward Hacking
arXiv Nov 2025Finding
Why Do Multi-Agent LLM Systems Fail?
NeurIPS 2025 (D&B Track)Finding/Benchmark
AISB Direction
Climate & Earth
Probabilistic weather forecasting with machine learning (GenCast)
method
AIFS -- ECMWF's data-driven forecasting system
method
Aurora: A Foundation Model for the Earth System
method
Neural General Circulation Models for Weather and Climate (NeuralGCM)
method
Can AI weather models predict out-of-distribution gray swan tropical cyclones?
finding
AISB Direction
Embodied AI
pi-0: A Vision-Language-Action Flow Model for General Robot Control
method
pi-0.5: A VLA Model with Open-World Generalization
method
GR00T N1: Open Foundation Model for Generalist Humanoid Robots
method
OpenVLA: An Open-Source Vision-Language-Action Model
method
RDT-1B: A Diffusion Foundation Model for Bimanual Manipulation
method
AISB Direction
LifeSci & Drug Discovery
Boltz-2
bioRxiv, Jun 2025method
Chai-2
bioRxiv, Jul 2025method
Protenix-v1
bioRxiv, Feb 2026method
AF3 in Drug Discovery: Comprehensive Assessment
bioRxiv, Apr 2025finding
Critical Assessment of ML models for TDC ADMET
bioRxiv, Feb 2026finding
AISB Direction
LM Reasoning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Nature, Sep 2025method
s1: Simple test-time scaling
arXiv, Jan 2025method
Think Deep, Not Just Long
arXiv, Feb 2026method
Does RL Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
NeurIPS 2025 Best Paper Runner-Upfinding
The Illusion of Thinking
arXiv, Jun 2025finding
AISB Direction
Material Science
Pushing the limits of unconstrained MLIPs (PET-OAM-XL)
arXiv, Jan 2026method
PET-MAD
Nat. Commun. 16:10653, Nov 2025method
eSEN
ICML 2025method
MACE-MP-0
J. Chem. Phys. 163:184110, Nov 2025method
MatterSim
arXiv, May 2024method
AISB Direction
Math Proof
DeepSeek-Prover-V2
arXiv, Apr 2025method
Goedel-Prover-V2
arXiv, Aug 2025method
Kimina-Prover Preview
arXiv, Apr 2025method
Seed-Prover 1.5
arXiv, Dec 2025method
HILBERT
arXiv, Sep 2025method
AISB Direction
Model Efficiency
Mercury: Ultra-Fast Language Models Based on Diffusion
method
Mamba-3
method
LLaDA: Large Language Diffusion with mAsking
method
KVzip: KV Cache Compression with Query-Agnostic Eviction
method
SageAttention3: First FP4 Attention Kernel
method
AISB Direction
Multimodal Fusion
Vision Language Models are Biased
ICLR 2026Finding/Benchmark
The Illusion of Thinking
arXiv June 2025Finding
The Illusion of the Illusion of Thinking (rebuttal)
arXiv June 2025Finding (rebuttal)
Breaking Down Video LLM Benchmarks
arXiv May 2025Finding/Benchmark
VideoQA in the Era of LLMs
arXiv Aug 2024, revised June 2025Finding
AISB Direction
Research Process
The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search
arXiv Apr 2025; Nature 2026 (doi:10.1038/s41586-026-10265-5)Method
Towards an AI co-scientist
arXiv Feb 2025; featured in Nature MedicineMethod
AI-Researcher: Autonomous Scientific Innovation
NeurIPS 2025 Spotlight (D&B Track)Method/Benchmark
Agent Laboratory: Using LLM Agents as Research Assistants
arXiv Jan 2025Method
AI Can Learn Scientific Taste
arXiv March 2026Method/Finding
AISB Direction
Self-Evolving RL
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents
method
Live-Evo: Online Evolution of Agentic Memory from Continuous Feedback
method
WebEvolver: Enhancing Web Agent Self-Improvement with Co-evolving World Model
method
EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle
method
Dr. Zero: Self-Evolving Search Agents without Training Data
method