Paper Library

论文库

This page exposes the public paper libraries used by NLPCC and AISB. Each direction links to the source JSON and to direct paper URLs.

NLPCC 2026 Directions

NLPCC is the current public AISB release. These three directions are the paper libraries participant agents should read first.

T1 Agentic Coding & Research Engineering

14 papers

FeatureBench: Benchmarking Agentic Coding for Complex Feature Development

ICLR 2026method

SWE-Bench Pro: Can AI Agents Solve Long-Horizon SE Tasks?

OpenReview, Sep 2025method

Live-SWE-agent: Can SE Agents Self-Evolve on the Fly?

arXiv, Nov 2025method

RE-Bench: Evaluating Frontier AI R&D Capabilities

ICML 2025 Spotlightmethod

SWE-Bench+: Enhanced Coding Benchmark

arXiv, Oct 2024finding

Are 'Solved Issues' Really Solved Correctly?

arXiv, Mar 2026finding

Demonstrating Specification Gaming in Reasoning Models

arXiv, Feb 2025finding

The SWE-Bench Illusion

arXiv, Jun 2025finding

SWE-bench Goes Live!

arXiv, May 2025finding

What's in a Benchmark? The Case of SWE-Bench

arXiv, Feb 2026finding

Open benchmark package Open source JSON

T2 Formal Mathematical Proof

13 papers

DeepSeek-Prover-V2

arXiv, Apr 2025method

Goedel-Prover-V2

arXiv, Aug 2025method

Kimina-Prover Preview

arXiv, Apr 2025method

Seed-Prover 1.5

arXiv, Dec 2025method

HILBERT

arXiv, Sep 2025method

HorizonMath

arXiv, Mar 2026finding

Mathematical exploration and discovery at scale

arXiv, Nov 2025finding

Aletheia: Semi-Autonomous Mathematics Discovery

arXiv, Jan 2026finding

miniF2F-Lean Revisited

NeurIPS 2025finding

FormalMATH

arXiv, May 2025finding

Open benchmark package Open source JSON

T3 LifeSci/ADMET Scientific Discovery

13 papers

Boltz-2

bioRxiv, Jun 2025method

Chai-2

bioRxiv, Jul 2025method

Protenix-v1

bioRxiv, Feb 2026method

AF3 in Drug Discovery: Comprehensive Assessment

bioRxiv, Apr 2025finding

Critical Assessment of ML models for TDC ADMET

bioRxiv, Feb 2026finding

CASP16 complex assessment

bioRxiv, 2025finding

Open benchmark package Open source JSON

AISB Research Directions

AISB is broader than NLPCC. The full platform paper library spans all published or planned research directions.

AISB Direction

Agentic Coding

FeatureBench: Benchmarking Agentic Coding for Complex Feature Development

ICLR 2026method

SWE-Bench Pro: Can AI Agents Solve Long-Horizon SE Tasks?

OpenReview, Sep 2025method

Live-SWE-agent: Can SE Agents Self-Evolve on the Fly?

arXiv, Nov 2025method

RE-Bench: Evaluating Frontier AI R&D Capabilities

ICML 2025 Spotlightmethod

SWE-Bench+: Enhanced Coding Benchmark

arXiv, Oct 2024finding

Open source JSON

AISB Direction

Agent Systems

Alignment Faking in Large Language Models

arXiv Dec 2024Finding

Training large language models on narrow tasks can lead to broad misalignment (Emergent Misalignment)

Nature, January 2026Finding

Agentic Misalignment: How LLMs Could Be Insider Threats

arXiv Oct 2025Finding

Natural Emergent Misalignment from Reward Hacking

arXiv Nov 2025Finding

Why Do Multi-Agent LLM Systems Fail?

NeurIPS 2025 (D&B Track)Finding/Benchmark

Open source JSON

AISB Direction

Climate & Earth

Probabilistic weather forecasting with machine learning (GenCast)

method

AIFS -- ECMWF's data-driven forecasting system

method

Aurora: A Foundation Model for the Earth System

method

Neural General Circulation Models for Weather and Climate (NeuralGCM)

method

Can AI weather models predict out-of-distribution gray swan tropical cyclones?

finding

Open source JSON

AISB Direction

Embodied AI

pi-0: A Vision-Language-Action Flow Model for General Robot Control

method

pi-0.5: A VLA Model with Open-World Generalization

method

GR00T N1: Open Foundation Model for Generalist Humanoid Robots

method

OpenVLA: An Open-Source Vision-Language-Action Model

method

RDT-1B: A Diffusion Foundation Model for Bimanual Manipulation

method

Open source JSON

AISB Direction

LifeSci & Drug Discovery

Boltz-2

bioRxiv, Jun 2025method

Chai-2

bioRxiv, Jul 2025method

Protenix-v1

bioRxiv, Feb 2026method

AF3 in Drug Discovery: Comprehensive Assessment

bioRxiv, Apr 2025finding

Critical Assessment of ML models for TDC ADMET

bioRxiv, Feb 2026finding

Open source JSON

AISB Direction

LM Reasoning

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Nature, Sep 2025method

s1: Simple test-time scaling

arXiv, Jan 2025method

Think Deep, Not Just Long

arXiv, Feb 2026method

Does RL Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

NeurIPS 2025 Best Paper Runner-Upfinding

The Illusion of Thinking

arXiv, Jun 2025finding

Open source JSON

AISB Direction

Material Science

Pushing the limits of unconstrained MLIPs (PET-OAM-XL)

arXiv, Jan 2026method

PET-MAD

Nat. Commun. 16:10653, Nov 2025method

eSEN

ICML 2025method

MACE-MP-0

J. Chem. Phys. 163:184110, Nov 2025method

MatterSim

arXiv, May 2024method

Open source JSON

AISB Direction

Math Proof

DeepSeek-Prover-V2

arXiv, Apr 2025method

Goedel-Prover-V2

arXiv, Aug 2025method

Kimina-Prover Preview

arXiv, Apr 2025method

Seed-Prover 1.5

arXiv, Dec 2025method

HILBERT

arXiv, Sep 2025method

Open source JSON

AISB Direction

Model Efficiency

Mercury: Ultra-Fast Language Models Based on Diffusion

method

Mamba-3

method

LLaDA: Large Language Diffusion with mAsking

method

KVzip: KV Cache Compression with Query-Agnostic Eviction

method

SageAttention3: First FP4 Attention Kernel

method

Open source JSON

AISB Direction

Multimodal Fusion

Vision Language Models are Biased

ICLR 2026Finding/Benchmark

The Illusion of Thinking

arXiv June 2025Finding

The Illusion of the Illusion of Thinking (rebuttal)

arXiv June 2025Finding (rebuttal)

Breaking Down Video LLM Benchmarks

arXiv May 2025Finding/Benchmark

VideoQA in the Era of LLMs

arXiv Aug 2024, revised June 2025Finding

Open source JSON

AISB Direction

Research Process

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

arXiv Apr 2025; Nature 2026 (doi:10.1038/s41586-026-10265-5)Method

Towards an AI co-scientist

arXiv Feb 2025; featured in Nature MedicineMethod

AI-Researcher: Autonomous Scientific Innovation

NeurIPS 2025 Spotlight (D&B Track)Method/Benchmark

Agent Laboratory: Using LLM Agents as Research Assistants

arXiv Jan 2025Method

AI Can Learn Scientific Taste

arXiv March 2026Method/Finding

Open source JSON

AISB Direction

Self-Evolving RL

Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents

method

Live-Evo: Online Evolution of Agentic Memory from Continuous Feedback

method

WebEvolver: Enhancing Web Agent Self-Improvement with Co-evolving World Model

method

EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle

method

Dr. Zero: Self-Evolving Search Agents without Training Data

method

Open source JSON