Benchmarks
We evaluate Vue Docs MCP against Context7, a general-purpose documentation MCP server supporting 9000+ libraries, using 173 Vue.js questions scored by an LLM judge.
Methodology
Each question has a ground-truth answer with expected API names and documentation paths. Both providers receive the same question and return documentation context. A Gemini judge (temperature 0) scores the retrieved context on 5 dimensions (1-5 scale). API recall measures whether expected API names appear in the response. See the eval/ directory in the repository for the full evaluation framework.
Overall Scores
| Metric | Vue Docs MCP | Context7 |
|---|---|---|
| Relevance | 4.93 🏆 | 2.09 |
| Completeness | 4.83 🏆 | 1.67 |
| Correctness | 4.87 🏆 | 1.86 |
| API Coverage | 4.53 🏆 | 1.90 |
| Conciseness | 4.95 | 4.55 |
| Composite | 4.82 🏆 | 2.41 |
Scores by Difficulty
| Difficulty | Questions | Vue Docs MCP | Context7 |
|---|---|---|---|
| Easy | 29 | 4.87 🏆 | 2.75 |
| Medium | 27 | 4.84 🏆 | 2.24 |
| Hard | 66 | 4.89 🏆 | 2.20 |
| Extreme | 51 | 4.69 🏆 | 2.58 |
Scores by Question Type
| Intent | Questions | Vue Docs MCP | Context7 |
|---|---|---|---|
| API Lookup | 18 | 4.93 🏆 | 2.17 |
| How-To | 62 | 4.86 🏆 | 2.43 |
| Debugging | 41 | 4.82 🏆 | 2.17 |
| Comparison | 20 | 4.83 🏆 | 2.75 |
| Conceptual | 30 | 4.65 🏆 | 2.56 |
Judge Dimension Breakdown
Retrieval and Cost
| Metric | Vue Docs MCP | Context7 |
|---|---|---|
| API Recall | 98.7% 🏆 | 53.1% |
| Avg Response Tokens | 4,213 | 1,739 |
| Avg Latency | 1.44s 🏆 | 1.72s |
| P95 Latency | 3.61s | 2.10s |
| Cost per Query (internal) | $0.0003 | N/A |
| Cost per Query (user-facing) | Free 🏆 | $0.002 |
Pass Rates
Percentage of questions where all judge dimensions scored at or above the threshold:
| Threshold | Vue Docs MCP | Context7 |
|---|---|---|
| All dimensions >= 5 | 83.8% 🏆 | 6.4% |
| All dimensions >= 4 | 86.7% 🏆 | 9.2% |
| All dimensions >= 3 | 88.4% 🏆 | 13.3% |
| All dimensions >= 2 | 90.8% 🏆 | 23.7% |
Notes on Fairness
- Path recall (97% vs 0.6%) is excluded from headline comparisons because our ground truth uses
vuejs.orgpaths. Context7 returnscontext7.comURLs, making this metric structurally unfair. - Context7 returns Vue 2 content for some Vue 3 questions, which legitimately affects its scores.
- Context7 is a general-purpose service covering 9000+ libraries. Vue Docs MCP is purpose-built for the Vue ecosystem. The comparison shows the quality advantage of specialization.
- The evaluation framework is open source in the
eval/directory. Runmake eval-compareto reproduce these results.