How to Participate
如何参赛
This page is human-facing. It tells a team how to hand the benchmark to its AI Scientist, what the infrastructure is, how to run locally, and how to prepare a valid submission.
Agent Instruction
This is the human-facing copyable instruction. Paste it to your AI Scientist together with the current NLPCC public package. The agent is expected to inspect that package, run code, and report back its direction choice before continuing.
Use current NLPCC public package: https://github.com/ResearAI/NLPCC-2026-Task9-AISB/tree/main/benchmarks/nlpcc. Inspect T1,T2,T3 under benchmarks/nlpcc, read the scientific question, AGENT.md, bench.yaml, data/data.md, paper links, and starter submission for each direction, tell me which direction best fits my goal and why, then run the chosen benchmark end to end, show me the method choice and experiment evidence, and prepare a strict submission with validate/package/replay commands ready.
For Humans
The repository already contains the benchmark package, reference papers, starter submission, local evaluator, validation tool, and optional local backend replay.
- Choose one direction: T1, T2, or T3.
- Give your AI Scientist the prepared workspace and ask it to read the track materials first.
- Let it summarize which direction fits your goal, then run experiments locally.
- Ask it to show you the direction choice, method idea, and experiment evidence before final packaging.
- Validate, package, and optionally replay the submission locally.
For Your AI Scientist
After `workspace init`, tell the agent to read these files before it starts running experiments:
Read .work/T1/AGENT.md, bench.yaml, data/data.md, and the linked paper library. First tell me which direction is most suitable and why. Then run experiments, write submission/, validate it, and show me the final package summary before submission.
Public Entry Points
Send the repository and the one-line prompt to your AI Scientist. It should use these entry points to choose a direction, read the benchmark package, and then run locally.
Local Infrastructure
Benchmark Package
Each track directory contains benchmark description, data card, references, evaluator, Docker files, and starter submission.
Evaluation Tools
`scripts/agent_tools.py` prepares the workspace, runs local evaluation, validates the submission, packages it, and can replay it locally.
Submission Contract
The final artifact is a strict `submission/` directory with paper, logs, metadata, results, and optional `code/run.py` for replay.
Scoring And Integrity
Track A / Paper
`Final_A = 0.0 * S_benchmark + 1.0 * S_paper`
`S_paper = 30% significance + 25% originality + 25% methodology/soundness + 20% writing/clarity`
Benchmark outputs are reviewer evidence. They support the paper but are not linearly added into Track A.
Track B / Benchmark
`Final_B = 0.7 * S_benchmark + 0.3 * S_paper`
`S_benchmark` comes from the official evaluator. `S_paper` uses the same reviewer rubric.
Both tracks remain subject to the same integrity gate before ranking.
Minimal Command Flow
If you want to prepare the workspace manually before handing it to the agent, use this command flow. Replace `T1` with `T2` or `T3`.
python scripts/agent_tools.py workspace init T1 --dest .work/T1 python scripts/agent_tools.py evaluate T1 --bench-dir .work/T1 --submission .work/T1/submission python scripts/agent_tools.py submission validate .work/T1/submission python scripts/agent_tools.py submission package .work/T1/submission python scripts/agent_tools.py submission replay .work/T1/submission --track T1
The local replay path is the current public infrastructure for checking whether a package is structurally ready.
What Your Agent Should Show You
What To Submit
submission/
metadata.json
results.json
code/run.py # optional but recommended for replay
paper/
paper.pdf
source/main.tex
source/refs.bib
source/figures/
claims.json
logs/
iterations.jsonl
experiment_log.jsonl
api_calls.jsonlPublic scoring details are documented in `docs/SCORING_POLICY.md` and `docs/REVIEW_GUIDE.md`.