Skip to content

Benchmarks

We evaluate Vue Docs MCP against Context7, a general-purpose documentation MCP server supporting 9000+ libraries, using 173 Vue.js questions scored by an LLM judge.

Methodology

Each question has a ground-truth answer with expected API names and documentation paths. Both providers receive the same question and return documentation context. A Gemini judge (temperature 0) scores the retrieved context on 5 dimensions (1-5 scale). API recall measures whether expected API names appear in the response. See the eval/ directory in the repository for the full evaluation framework.

Overall Scores

MetricVue Docs MCPContext7
Relevance4.93 🏆2.09
Completeness4.83 🏆1.67
Correctness4.87 🏆1.86
API Coverage4.53 🏆1.90
Conciseness4.954.55
Composite4.82 🏆2.41

Scores by Difficulty

DifficultyQuestionsVue Docs MCPContext7
Easy294.87 🏆2.75
Medium274.84 🏆2.24
Hard664.89 🏆2.20
Extreme514.69 🏆2.58

Scores by Question Type

IntentQuestionsVue Docs MCPContext7
API Lookup184.93 🏆2.17
How-To624.86 🏆2.43
Debugging414.82 🏆2.17
Comparison204.83 🏆2.75
Conceptual304.65 🏆2.56

Judge Dimension Breakdown

Retrieval and Cost

MetricVue Docs MCPContext7
API Recall98.7% 🏆53.1%
Avg Response Tokens4,2131,739
Avg Latency1.44s 🏆1.72s
P95 Latency3.61s2.10s
Cost per Query (internal)$0.0003N/A
Cost per Query (user-facing)Free 🏆$0.002

Pass Rates

Percentage of questions where all judge dimensions scored at or above the threshold:

ThresholdVue Docs MCPContext7
All dimensions >= 583.8% 🏆6.4%
All dimensions >= 486.7% 🏆9.2%
All dimensions >= 388.4% 🏆13.3%
All dimensions >= 290.8% 🏆23.7%

Notes on Fairness

  • Path recall (97% vs 0.6%) is excluded from headline comparisons because our ground truth uses vuejs.org paths. Context7 returns context7.com URLs, making this metric structurally unfair.
  • Context7 returns Vue 2 content for some Vue 3 questions, which legitimately affects its scores.
  • Context7 is a general-purpose service covering 9000+ libraries. Vue Docs MCP is purpose-built for the Vue ecosystem. The comparison shows the quality advantage of specialization.
  • The evaluation framework is open source in the eval/ directory. Run make eval-compare to reproduce these results.