Evidence before score
Every verdict names what was tested, what was observed, and what was outside scope.
Agent trust rail
SilentCritique publishes dated, evidence-backed verdicts from real tests. Each score carries a badge, methodology, visible limitations, and a right-of-reply path.
Verdict count
Every score comes from a real, dated test.
Published verdicts
10
Internal methodology examples are labeled separately and never counted as published verdicts.
Every verdict names what was tested, what was observed, and what was outside scope.
Evaluator disagreement is part of the artifact, not edited out for a cleaner score.
Embeds point to the canonical verdict so the score, date, methodology, and any reply can be checked.
Published corpus
Score
82
/100
Accurate git reads, powerful write surface
The Git reference server passed a real MCP smoke test: against a fresh repository it reported a clean status and returned an accurate commit log. The score is solid for read operations, with caution because the same server exposes commit, reset, and checkout tools whose safety depends entirely on which repository paths the client allows.
Tested 2026-06-13 · sc-agent-trust-v0.1
Open verdictScore
81
/100
Clean protocol conformance in smoke test
The Everything reference server passed a real MCP smoke test: it discovered 13 tools and returned correct results for an echo and a numeric sum. The score is solid for protocol conformance, with caution because this is an explicit feature-demonstration server — including an environment-dump tool — and is not meant for production deployment.
Tested 2026-06-13 · sc-agent-trust-v0.1
Open verdictScore
79
/100
Focused fetch tool, needs egress guardrails
The Fetch reference server passed a real MCP smoke test: it fetched a live URL and returned simplified markdown with length controls. The score is good for a focused, single-purpose tool, with caution because it will fetch arbitrary URLs and needs client-side egress and internal-network controls.
Tested 2026-06-13 · sc-agent-trust-v0.1
Open verdictScore
78
/100
Clean conversion, broad URI reach
Microsoft's MarkItDown MCP server passed a real MCP smoke test: it converted a local HTML fixture into clean markdown. The score is good for a focused conversion tool, with caution because it accepts file and http URIs, giving it local-file-read and network-egress reach that the client must constrain.
Tested 2026-06-13 · sc-agent-trust-v0.1
Open verdictScore
76
/100
Works as shown, but archived and write-broad
The SQLite reference server passed a real MCP smoke test: it created a table, inserted a row, and read it back correctly. The score is moderate because the server now lives in the archived servers repository and its write_query tool executes arbitrary mutating SQL with no statement-level guardrails.
Tested 2026-06-13 · sc-agent-trust-v0.1
Open verdictScore
88
/100
Narrow, deterministic utility
The Time reference server passed a real MCP smoke test for current-time lookup and timezone conversion. It scores well because the task is narrow, deterministic, and low-side-effect, though invalid input and localization behavior were not deeply tested.
Tested 2026-06-11 · sc-agent-trust-v0.1
Open verdictScore
84
/100
Strong scoped file control in smoke test
The Filesystem reference server passed a real MCP smoke test for tool discovery, allowed-directory read/write, and denial of an out-of-scope /etc/hosts read. The score is high for scoped file controls, with caution because the tool surface includes destructive write, edit, and move capabilities.
Tested 2026-06-11 · sc-agent-trust-v0.1
Open verdictScore
82
/100
Capable browser automation with high-power surface
Playwright MCP passed a real MCP smoke test for browser navigation and accessibility snapshot extraction. It exposes a rich, useful automation surface, but the same power means hosts need strict policy around origins, files, credentials, and side-effecting browser actions.
Tested 2026-06-11 · sc-agent-trust-v0.1
Open verdictScore
76
/100
Useful memory primitive, limited assurance
The Memory reference server passed a real MCP smoke test for entity creation and search. It is useful as a simple knowledge-graph memory primitive, but the test did not validate persistence guarantees, conflict behavior, or controls for sensitive memory use.
Tested 2026-06-11 · sc-agent-trust-v0.1
Open verdictScore
68
/100
Functional narrow reasoning utility
The Sequential Thinking reference server passed a real MCP smoke test for its single reasoning tool. Its behavior is legible and low-side-effect, but the evaluated utility is narrow and the test did not validate complex branching or long-running reasoning quality.
Tested 2026-06-11 · sc-agent-trust-v0.1
Open verdictMethodology examples
These examples validate page structure, scoring language, and evidence layout. They are noindex and must not be treated as Trust 100 outcomes.
Example
86
Strong privacy boundary
The private report sharing route keeps token-gated reports out of search, which is the right privacy stance and a useful boundary for the new public verdict product.
Internal calibration only. Not a public third-party verdict.
Open exampleExample
81
Operationally credible
The agent-facing instant critique endpoint has strong operational controls, including API-key auth, wallet charging, idempotency, rate limits, URL safety, and callback validation.
Internal calibration only. Not a public third-party verdict.
Open exampleExample
76
Accountability foundation present
Wallet-funded tool jobs are a solid foundation for accountable machine work, but the public certificate layer still needs a score history and verification surface.
Internal calibration only. Not a public third-party verdict.
Open exampleExample
72
Promising infrastructure, limited market proof
The protocol has a concrete trust, staking, discovery, and tool-catalog contract, but it still needs public third-party execution history before it can function as an independent trust signal.
Internal calibration only. Not a public third-party verdict.
Open exampleExample
58
Legible mechanics, weak demand proof
The marketplace page makes participation terms legible, but the product should not rely on marketplace liquidity until public verdict demand exists.
Internal calibration only. Not a public third-party verdict.
Open example