The curation bar for every entry in this repository. Applied on every PR that adds or refreshes an entry.
This rubric is the formal scoring model for the four-field “Entry Rubric (v1)” summary in
CONTRIBUTING.md. The v1 fields stay as the quick self-assessment contributors fill in; reviewers use the seven dimensions below to score and gate the merge.
| # | Dimension | Weight | What it measures |
|---|---|---|---|
| 1 | Reliability | ×3 | Production-readiness, API stability, known deployments, operator track record, failure-mode transparency. |
| 2 | Evidence | ×3 | Claims are anchored in [official] docs, [benchmark] results with methodology, [field report] write-ups, or [author assessment] synthesized from the above. See appendix/benchmark-and-evidence-policy.md. |
| 3 | Agentic relevance | ×3 | Direct fit to the mission scope: agent architectures, orchestration, memory, evaluation, protocols, tool use, multi-agent. |
| 4 | Uniqueness | ×2 | What this entry adds that existing entries in the list do not. No duplicates without added structure or judgement. |
| 5 | Maturity | ×2 | Release cadence, active maintenance, project age, breaking-change discipline, docs depth. |
| 6 | Licensing / openness | ×1 | License clarity; does it permit the use described in the entry. |
| 7 | Community signal | ×1 | Stars, adoption, ecosystem traction. Tiebreaker only. Never sufficient on its own. |
Each dimension is scored 0–3:
| Score | Meaning |
|---|---|
| 0 | Fails — no credible signal, or signal contradicts the claim. |
| 1 | Weak — partial or thin signal; would not hold up to scrutiny. |
| 2 | Solid — clear, verifiable signal; meets the bar. |
| 3 | Exemplary — well-documented, widely verified, best-in-class signal. |
Candidate: FictionalFlow — a hypothetical orchestration framework for typed multi-agent DAGs.
| Dimension | Score | Weighted | Justification |
|---|---|---|---|
| Reliability | 2 | 6 | [field report] from one public production deployment; stable v1 API for 9 months. |
| Evidence | 2 | 6 | [official] docs thorough; one [benchmark] on a realistic workload; no independent replication yet. |
| Agentic relevance | 3 | 9 | Core agent orchestration primitive; first-class tool use, memory, and approval gates. |
| Uniqueness | 2 | 4 | Typed DAG + checkpointing distinct from existing entries; overlaps partially with LangGraph. |
| Maturity | 2 | 4 | Semantic versioning, ~14 months old, weekly releases, responsive issue triage. |
| Licensing | 3 | 3 | Apache-2.0, clear contributor agreement. |
| Community signal | 2 | 2 | Moderate adoption; not a primary decision factor here. |
| Total | 34 / 45 | Merge: hard gates pass, total ≥ 27. |
Reviewer note: Approve for merge. Flag Evidence for re-review in 6 months if independent benchmarks materialize.
Contributors fill in four fields (Reliability, Evidence, Uniqueness, Maturity) in the PR template — the quick self-assessment. Reviewers expand that into the full seven-dimension weighted score above. A PR can only merge when the reviewer’s scored table meets the threshold and passes the hard gates.
See ANTI-PATTERNS.md for concrete rejection patterns mapped to the dimensions they fail.
This rubric is reviewed quarterly. Changes to dimensions, weights, or thresholds require a PR against main citing the review that prompted the change.