Benchmarks

We evaluate Vue Docs MCP against Context7, a general-purpose documentation MCP server supporting 9000+ libraries, using 173 Vue.js questions scored by an LLM judge.

Methodology

Each question has a ground-truth answer with expected API names and documentation paths. Both providers receive the same question and return documentation context. A Gemini judge (temperature 0) scores the retrieved context on 5 dimensions (1-5 scale). API recall measures whether expected API names appear in the response. See the eval/ directory in the repository for the full evaluation framework.

Overall Scores

Metric	Vue Docs MCP	Context7
Relevance	4.93 🏆	2.09
Completeness	4.83 🏆	1.67
Correctness	4.87 🏆	1.86
API Coverage	4.53 🏆	1.90
Conciseness	4.95	4.55
Composite	4.82 🏆	2.41

Scores by Difficulty

Difficulty	Questions	Vue Docs MCP	Context7
Easy	29	4.87 🏆	2.75
Medium	27	4.84 🏆	2.24
Hard	66	4.89 🏆	2.20
Extreme	51	4.69 🏆	2.58

Scores by Question Type

Intent	Questions	Vue Docs MCP	Context7
API Lookup	18	4.93 🏆	2.17
How-To	62	4.86 🏆	2.43
Debugging	41	4.82 🏆	2.17
Comparison	20	4.83 🏆	2.75
Conceptual	30	4.65 🏆	2.56

Judge Dimension Breakdown

Retrieval and Cost

Metric	Vue Docs MCP	Context7
API Recall	98.7% 🏆	53.1%
Avg Response Tokens	4,213	1,739
Avg Latency	1.44s 🏆	1.72s
P95 Latency	3.61s	2.10s
Cost per Query (internal)	$0.0003	N/A
Cost per Query (user-facing)	Free 🏆	$0.002

Pass Rates

Percentage of questions where all judge dimensions scored at or above the threshold:

Threshold	Vue Docs MCP	Context7
All dimensions >= 5	83.8% 🏆	6.4%
All dimensions >= 4	86.7% 🏆	9.2%
All dimensions >= 3	88.4% 🏆	13.3%
All dimensions >= 2	90.8% 🏆	23.7%

Notes on Fairness

Path recall (97% vs 0.6%) is excluded from headline comparisons because our ground truth uses vuejs.org paths. Context7 returns context7.com URLs, making this metric structurally unfair.
Context7 returns Vue 2 content for some Vue 3 questions, which legitimately affects its scores.
Context7 is a general-purpose service covering 9000+ libraries. Vue Docs MCP is purpose-built for the Vue ecosystem. The comparison shows the quality advantage of specialization.
The evaluation framework is open source in the eval/ directory. Run make eval-compare to reproduce these results.

Benchmarks ​

Overall Scores ​

Scores by Difficulty ​

Scores by Question Type ​

Judge Dimension Breakdown ​

Retrieval and Cost ​

Pass Rates ​

Notes on Fairness ​