Skip to main content

Browse by Workflow

Jump by scientific workflow stage or capability.

Canonical lifecycle stages

These 6 slugs are the lifecycle axis used to tag every entry on the site, lifted directly from the project mission (specs/mission.md, kept local):

  • literature-intelligence — search, summarization, citation graphs, knowledge extraction.
  • hypothesis-generation — proposing novel, testable scientific claims.
  • experiment-planning — protocols, simulations, study design, adaptive experimentation.
  • tool-use-execution — code generation, lab automation, simulators, agent frameworks acting on the world.
  • evaluation — benchmarks, reproducibility checks, result validation.
  • scientific-communication — writing, figure generation, peer-review assistance, reproducible artifacts.

Some cards below are capability tracks (e.g., multi-agent systems, safety & governance) rather than lifecycle stages — they cut across multiple stages and are not part of the canonical lifecycle axis.

See start-here for the full three-axis tagging convention.

Browse the cards

Review

Literature review

Retrieval, synthesis, contradiction checks, metadata graphs, and QA.

Tags: lifecycle:literature-intelligence · domain:cross-domain · type:tool,framework

Ideation

Hypothesis generation

Novelty detection, idea generation, and structured research ideation.

Tags: lifecycle:hypothesis-generation · domain:cross-domain · type:agent-system,paper

Planning

Experiment planning

Optimization, adaptive design, and experiment selection.

Tags: lifecycle:experiment-planning · domain:cross-domain · type:framework,tool

Execution

Tool use

Lab systems, APIs, cloud labs, and tooling that lets agents act.

Tags: lifecycle:tool-use-execution · domain:cross-domain · type:framework,tool,agent-system

Agents

Multi-agent systems

Debate, coordination, specialist roles, and agent collaboration patterns.

Capability track (cuts across lifecycle stages). Tags: lifecycle:hypothesis-generation,tool-use-execution · domain:cross-domain · type:framework,agent-system

Evaluation

Benchmarks

Capability checks for reasoning, coding, and scientific tasks.

Tags: lifecycle:evaluation · domain:cross-domain · type:benchmark

Reporting

Communication

Manuscript generation, figures, writing, and research reporting tools.

Tags: lifecycle:scientific-communication · domain:cross-domain · type:tool,framework

Safety

Safety & governance

Dual-use risk, evaluation, policy, and guardrail resources.

Capability track (cuts across lifecycle stages). Tags: lifecycle:evaluation,scientific-communication · domain:cross-domain · type:paper,blog-essay

Resources by lifecycle stage

Hand-picked entries grouped by canonical lifecycle slug. Every bullet carries the full three-axis tag set (lifecycle: · domain: · type:) locked in Phase 2. Anchors below match the lifecycle slugs verbatim so external links stay stable.

Literature intelligence

Search, summarization, citation graphs, and structured knowledge extraction over the scholarly record. This is where most "AI scientist" projects start: turning the long tail of papers into a queryable substrate that downstream stages can reason over.

  • Semantic Scholar Academic Graph API — Open API over ~200M papers with citation context, embeddings, and TLDRs from Allen AI. Tags: lifecycle:literature-intelligence · domain:cross-domain · type:tool
  • OpenAlex — Fully open scholarly graph (works, authors, institutions, concepts) with a permissive API; a free replacement for Microsoft Academic Graph. Tags: lifecycle:literature-intelligence · domain:cross-domain · type:dataset,tool
  • PaperQA2 — Retrieval-augmented QA agent over PDFs that produces grounded, citation-rich answers; reference implementation from FutureHouse. Tags: lifecycle:literature-intelligence · domain:cross-domain · type:framework,agent-system
  • SPECTER2 — Scientific document embeddings from Allen AI for retrieval, recommendation, and citation prediction. Tags: lifecycle:literature-intelligence · domain:cross-domain · type:model
  • STORM — Stanford OVAL system that drafts Wikipedia-style articles by retrieving and synthesizing sources end-to-end. Tags: lifecycle:literature-intelligence,scientific-communication · domain:cross-domain · type:agent-system,framework

Hypothesis generation

Proposing novel, testable scientific claims grounded in prior work. Resources here range from idea-generation agents to multi-agent debate systems that surface contradictions in the literature.

  • The AI Scientist — Sakana AI's end-to-end pipeline that ideates, runs, and writes up ML research papers; a useful reference for full-loop hypothesis systems. Tags: lifecycle:hypothesis-generation,tool-use-execution · domain:cross-domain · type:agent-system,paper
  • ResearchAgent — Iterative LLM agent for research idea generation that critiques and refines hypotheses against retrieved literature. Tags: lifecycle:hypothesis-generation · domain:cross-domain · type:paper
  • SciAgents — Multi-agent graph-reasoning framework (Buehler) for materials hypothesis generation across an ontological knowledge graph. Tags: lifecycle:hypothesis-generation · domain:materials-science · type:paper,agent-system
  • Towards an AI co-scientist — Google Research's multi-agent system that generates, debates, and ranks hypotheses for a target research goal. Tags: lifecycle:hypothesis-generation · domain:cross-domain · type:blog-essay,agent-system
  • Coscientist — Boiko et al. autonomous agent that designs, plans, and executes chemical experiments; the Nature paper is a canonical hypothesis-to-execution case study. Tags: lifecycle:hypothesis-generation,tool-use-execution · domain:drug-discovery-chemistry · type:agent-system,paper

Experiment planning

Designing protocols, simulations, and adaptive study procedures. The dominant pattern here is sequential decision-making — Bayesian optimization, active learning, and multi-fidelity surrogates that pick the next experiment to run.

  • BoTorch — PyTorch-native Bayesian optimization library underpinning much of modern adaptive experimentation. Tags: lifecycle:experiment-planning · domain:cross-domain · type:framework
  • Ax — Adaptive Experimentation Platform from Meta; high-level API over BoTorch for managing real experiment campaigns. Tags: lifecycle:experiment-planning · domain:cross-domain · type:framework,tool
  • Optuna — Define-by-run hyperparameter and experiment optimization framework with pruning and distributed trials. Tags: lifecycle:experiment-planning · domain:cross-domain · type:framework
  • Emukit — Toolkit for emulation, multi-fidelity modelling, and experimental design over expensive simulators. Tags: lifecycle:experiment-planning · domain:cross-domain · type:framework
  • BayBE — Merck's Bayesian Back End, a domain-aware experiment planner aimed at chemistry and formulations campaigns. Tags: lifecycle:experiment-planning · domain:drug-discovery-chemistry · type:framework,tool

Tool use & execution

Code generation, lab automation, simulators, and agent frameworks that let a system act — calling tools, running code, or driving instruments — rather than just reasoning in text.

  • ChemCrow — LLM agent equipped with 18 expert chemistry tools (synthesis, safety, search) for autonomous chemical reasoning. Tags: lifecycle:tool-use-execution · domain:drug-discovery-chemistry · type:agent-system,framework
  • LangChain — Widely used framework for composing LLMs with tools, retrievers, and workflows; common substrate for scientific agents. Tags: lifecycle:tool-use-execution · domain:cross-domain · type:framework
  • AutoGen — Microsoft framework for multi-agent conversation patterns and code execution; used in several published scientific agent setups. Tags: lifecycle:tool-use-execution · domain:cross-domain · type:framework
  • smolagents — Hugging Face's minimalist code-acting agent library; agents emit Python that is executed in a sandbox. Tags: lifecycle:tool-use-execution · domain:cross-domain · type:framework
  • Opentrons — Open-source software stack for OT-2 / Flex liquid-handling robots; the canonical "agent talks to a wet lab" surface. Tags: lifecycle:tool-use-execution · domain:genomics-biology · type:framework,tool

Evaluation

Benchmarks, reproducibility checks, and result validation. Without these, "AI scientist" claims do not survive contact with peer review — this stage is what makes a system credible.

  • SWE-bench — Benchmark of real GitHub issues that tests whether agents can produce repository-level patches; the de-facto bar for code-acting agents. Tags: lifecycle:evaluation · domain:cross-domain · type:benchmark,dataset
  • GPQA — Graduate-level Google-proof Q&A in biology, physics, and chemistry; a reasoning benchmark widely cited in frontier model reports. Tags: lifecycle:evaluation · domain:cross-domain · type:benchmark,dataset
  • MLE-bench — OpenAI benchmark of 75 Kaggle competitions for measuring whether agents can run end-to-end ML engineering work. Tags: lifecycle:evaluation · domain:cross-domain · type:benchmark
  • LAB-Bench — FutureHouse benchmark for biology research tasks (literature, figures, protocols, DBQA, cloning); aimed squarely at scientific agents. Tags: lifecycle:evaluation · domain:genomics-biology · type:benchmark,dataset
  • ScienceQA — Multimodal multiple-choice science benchmark with chain-of-thought explanations across grade-school science domains. Tags: lifecycle:evaluation · domain:cross-domain · type:benchmark,dataset

Scientific communication

Manuscript drafting, figure generation, peer-review assistance, and reproducible artifacts. The output stage of the loop — how findings get turned into something other scientists can read, run, and review.

  • Quarto — Open scientific publishing system that renders Markdown / Jupyter / R notebooks to articles, sites, slides, and books from one source. Tags: lifecycle:scientific-communication · domain:cross-domain · type:framework,tool
  • Manubot — Tooling for writing scholarly manuscripts collaboratively over Git, with automatic citation rendering and reproducible builds. Tags: lifecycle:scientific-communication · domain:cross-domain · type:framework,tool
  • STORM — Stanford system for grounded, citation-rich long-form article generation; useful as a reference implementation for AI-assisted writing. Tags: lifecycle:scientific-communication,literature-intelligence · domain:cross-domain · type:agent-system,framework
  • OpenReview — Open peer-review platform powering NeurIPS, ICLR, COLM, etc.; the surface where automated reviewer-assist tooling is increasingly evaluated. Tags: lifecycle:scientific-communication,evaluation · domain:cross-domain · type:tool,community-event
  • Reviewer2 — Two-stage paper review generation pipeline; representative of the emerging body of work on AI-assisted peer review. Tags: lifecycle:scientific-communication,evaluation · domain:cross-domain · type:paper