How It Works
Vue Docs MCP uses a two-stage architecture: an offline ingestion pipeline that indexes the Vue documentation, and an online query pipeline that handles search requests.
Ingestion Pipeline
The ingestion pipeline runs periodically (every 24 hours on the hosted server) and processes the official Vue.js documentation source.
Vue docs (markdown) -> Parse -> Enrich -> Embed -> Store1. Structure-Aware Parsing
Markdown files are parsed respecting the heading hierarchy. Each document is split into structural chunks at heading boundaries instead of fixed-size token windows. Code examples stay paired with their explanations.
The parser uses markdown-it-py and produces chunks at multiple levels:
- Page: top-level document
- Section: H2 headings
- Subsection: H3 headings
- Code block: standalone code examples
2. Entity Extraction
API entities are extracted deterministically from the documentation. Each entity gets a type (lifecycle hook, composable, directive, component, compiler macro, global API, etc.) and is linked to its documentation section.
A curated synonym table maps common aliases and variations. For example, v-model maps to two-way binding, and ref maps to reactivity primitives.
3. Contextual Enrichment
Each chunk receives a contextual prefix generated by Gemini that situates it within the broader documentation. This helps the embedding model understand chunks that would be ambiguous in isolation.
4. HyPE Question Generation
For key chunks, the system generates Hypothetical Previous Embeddings: synthetic developer questions that someone might ask when looking for this content. These are embedded alongside the original content as separate Qdrant points to improve recall at query time.
5. Embedding
All content is embedded using Jina's unified embedding model (jina-embeddings-v4). Dense embeddings and BM25 sparse vectors are generated for every chunk and stored in Qdrant.
6. Storage
- Qdrant: Dense and sparse vectors for hybrid search
- PostgreSQL: Entities, synonyms, page metadata, BM25 model, and index state
Query Pipeline
When your AI assistant calls vue_docs_search, the query goes through a 6-step pipeline:
Query -> Embed & Detect -> Hybrid Search -> Resolve HyPE -> Expand -> Rerank -> Reconstruct1. Embed Query & Detect Entities
Three things happen in parallel:
- Dense embedding. The query is embedded via Jina (
jina-embeddings-v4) into the same vector space as the documentation. - Sparse vector. A BM25 sparse vector is generated locally for keyword matching.
- Entity detection. The query is scanned for Vue API names using exact matching, bigram matching (e.g., "watch effect" matches
watchEffect), synonym lookup, and fuzzy matching viarapidfuzzfor typo tolerance.
All of this is deterministic except the Jina embedding call. No LLM is involved at query time.
2. Hybrid Search
The dense and sparse vectors are sent to Qdrant in a single hybrid search query, retrieving up to 50 candidates. Qdrant combines dense semantic similarity with BM25 keyword matching internally.
If the search scope is narrowed (e.g., scope: "guide/components") and returns no results, the pipeline automatically falls back to searching all scopes.
3. Resolve HyPE Hits
HyPE question chunks (synthetic questions generated during ingestion) are resolved back to their parent content chunks. If both a HyPE question and its parent appear in results, they are deduplicated, keeping the highest score.
4. Cross-Reference Expansion
Top results are checked for outgoing cross-references in their metadata. Referenced documentation sections are fetched from Qdrant and added as additional candidates:
- High-priority cross-references (guide to API) are always followed
- Medium-priority (same folder) are followed for the top 10 results
- Low-priority (cross-folder) are followed for the top 5 results only
Only one hop is followed. There is no recursive expansion.
5. Reranking
All candidates are reranked by Jina's reranker model (jina-reranker-v3) for final precision. Hits scoring below -0.5 after reranking are discarded.
6. Reconstruction
Final results are sorted by their position in the documentation (by reading order, not by score) and reassembled into readable markdown fragments. Adjacent chunks from the same page are merged. The result reads like a coherent mini-document rather than a list of search snippets.
Cost
The hosted server is free. For self-hosted instances, per-query cost is approximately $0.0005, dominated by the Jina embedding and reranking calls. No LLM calls happen at query time.