A public-data-only model catalog + a selector — a skill declares NEEDS, the catalog picks the best CURRENT model, never a hard-coded id.
CANDIDATE tooling — not official OpenAPI. Suluk is a single-contributor candidate for OpenAPI Specification v4.0 ("Moonwalk"), unaffiliated with the OpenAPI Initiative and unable to ratify anything on the SIG's behalf.
bun add @suluk/models
HardFilters (capabilities, min-context, price caps, governance allowlist) + Preferences (a named profile, or ≤4 small integer weights). selectModel filters the catalog down to survivors, then ranks them — and explains why.intel.*) are bucketed to frontier | strong | mid | basic with {source, asOf} — never a fake 2-decimal score.null (never imputed to "worst", which would kill new models). Hard filters fail closed on unknown; soft axes surface as coverageGaps on the winner.SEED_CATALOG (small, hand-curated, for tests/examples) and OPENROUTER_CATALOG (the committed, content-addressed ~300-model artifact, regenerated weekly by scripts/refresh.ts).@suluk/agents consumes). Declare what the task needs and let the catalog resolve the current best.modelAllowlist / allowedRetention MEET that a preference can never widen, plus a snapshotHash so the choice is reproducible week-over-week.Reach elsewhere when: you need to actually call a model (that's the runtime client, e.g. @suluk/chat's OpenRouter wiring) — this package only selects. And when you want per-endpoint region/latency facts: those are deliberately not in the schema yet (OpenRouter routes endpoints at runtime), so gov.region / dataRetention stay UNKNOWN at the model level rather than being faked.
import { selectModel, OPENROUTER_CATALOG } from "@suluk/models";
const result = selectModel(
// HardFilters — these FILTER (can empty the set ⇒ fail loud), never rank
{ needsTools: true, minWindowRequired: 200_000 },
// Preferences — a named profile is the 90% case
{ profile: "tool-reliable" },
OPENROUTER_CATALOG,
);
if (result.ranked.length === 0) {
// FAIL LOUD — names the unsatisfiable filter(s), e.g. "min-window>=200000 (excluded 142)"
throw new Error(`no model fits: ${result.unsatisfiable?.join("; ")}`);
}
const best = result.ranked[0];
best.id; // e.g. "anthropic/claude-sonnet-4.5"
best.why.passedFilters; // ["tool-calling", "min-window>=200000", "status-active", ...]
best.why.decidingPreference; // "intelligence (weight 3)"
best.why.tierByAxis; // { intelligence: {tier, source, asOf}, latency: {...}, cost: {...} }
result.candidateCount; // survivors after hard filtering
result.coverageGaps; // soft axes with no data on the winner (honesty surface)
A profile is a preset; power users override with ≤4 small int weights (0..3) and route the single "intelligence" knob to one INTEL sub-axis via taskShape.
const r = selectModel(
{ needsStructured: true },
{ prefer: { intelligence: 3, cost: 1, speed: 0, context: 1 }, taskShape: "coding" },
OPENROUTER_CATALOG,
);
The six built-in profiles (PROFILES): tool-reliable, cheap-fast, balanced, max-reasoning, long-context, vision. Each is preset weights + auto-wired implied filters (e.g. tool-reliable implies needsTools, vision implies inputModalities: ["image"]).
policy is fail-closed and non-overridable: a preference can never widen it, and an unknown governance cell is excluded.
selectModel(
{ policy: { modelAllowlist: ["google/gemini-2.5-flash"], allowedRetention: ["zero", "ephemeral"] } },
{ profile: "max-reasoning" }, // even max-reasoning can't escape the allowlist
OPENROUTER_CATALOG,
);
deriveRequirements is the seam that turns "this agent has tool routes + needs a 120k window" into filters:
import { deriveRequirements } from "@suluk/models";
deriveRequirements({ hasRoutes: true, minWindowRequired: 120_000 });
// → { needsTools: true, minWindowRequired: 120000 }
The committed catalog is regenerated, not hand-edited. The transform is pure + unit-tested; the live fetch is a thin wrapper.
import { fetchOpenRouterCatalog, applyTierOverlay, KNOWN_TIERS } from "@suluk/models";
// Class A: OpenRouter /models → the decidable fact-cell catalog (NETWORK — run from a script/CI, not tests)
let catalog = await fetchOpenRouterCatalog("2026-06-21");
// Class B: overlay coarse, cited benchmark tiers onto intel.* (KNOWN_TIERS is the conservative bootstrap seed)
catalog = applyTierOverlay(catalog, KNOWN_TIERS, { source: "public-leaderboard-consensus", asOf: "2026-06-21" });
The repo ships this as a script:
bun scripts/refresh.ts [asOf] # → writes src/openrouter-catalog.json (committed, content-addressed)
For the pure pieces: normalizeOpenRouter(models, asOf) / normalizeOpenRouterModel(m, asOf) turn raw ORModel[] into fact cells, catalogFrom(rows, asOf) wraps them with a snapshotHash, and applyBucketing(axis, score) maps a raw leaderboard number to a tier per the committed BUCKETING_RULES.
| Export | What it does |
|---|---|
selectModel(reqs, prefs, catalog) |
Filter (HardFilters) then rank (Preferences) → SelectResult with a per-model why. |
deriveRequirements(input) |
Map an agent/skill's declared structure → HardFilters. |
PROFILES |
The six named profiles → preset weights + implied filters (ResolvedProfile). |
OPENROUTER_CATALOG |
The committed, content-addressed ~300-model catalog (real prices/context/caps + a frontier intel seed). |
SEED_CATALOG |
A small hand-curated catalog for tests/examples. |
fetchOpenRouterCatalog(asOf, opts?) |
Live OpenRouter /models fetch → fact-cell ModelCatalog. |
normalizeOpenRouter / normalizeOpenRouterModel |
Pure ORModel[] → fact-cell records. |
catalogFrom(rows, asOf) / snapshotHash(rows) |
Wrap rows into a content-addressed ModelCatalog. |
applyTierOverlay(catalog, tiers, opts) |
Overlay coarse intel.* tiers onto matching rows; re-hashes. |
applyBucketing(axis, score) / BUCKETING_RULES |
Map a raw leaderboard metric → a coarse Tier per the committed, cited rules. |
KNOWN_TIERS |
A small, cited seed of frontier standings (the Class-B bootstrap). |
Types: Tier, Cell, DataRetention, ModelRecord, ModelCatalog, HardFilters, Profile, Preferences, RankedModel, SelectResult, ResolvedProfile, AxisRule, IntelAxis, ORModel.
This package selects, it does not call. It hands back ranked ids + a why-explainer; resolving that to a runtime target (pin a concrete id, fence an OpenRouter openrouter/auto router by the survivor set, or defer to a ~latest alias) and actually invoking the model lives downstream in @suluk/agents and the runtime client.
The fact transform is pure and never reaches for the clock or the network: asOf is always injected (so a run is reproducible) and the only network call, fetchOpenRouterCatalog, is a thin wrapper you run from a weekly script/CI. The benchmark-tier overlay (intel.*) is a separate, lower-cadence, human-reviewed step — see REFRESH.md. Anything the catalog can't prove stays UNKNOWN; it is never imputed.
Apache-2.0