How It Works

Vue Docs MCP uses a two-stage architecture: an offline ingestion pipeline that indexes the Vue documentation, and an online query pipeline that handles search requests.

Ingestion Pipeline

The ingestion pipeline runs periodically (every 24 hours on the hosted server) and processes the official Vue.js documentation source.

Vue docs (markdown) -> Parse -> Enrich -> Embed -> Store

1. Structure-Aware Parsing

Markdown files are parsed respecting the heading hierarchy. Each document is split into structural chunks at heading boundaries instead of fixed-size token windows. Code examples stay paired with their explanations.

The parser uses markdown-it-py and produces chunks at multiple levels:

Page: top-level document
Section: H2 headings
Subsection: H3 headings
Code block: standalone code examples

2. Entity Extraction

API entities are extracted deterministically from the documentation. Each entity gets a type (lifecycle hook, composable, directive, component, compiler macro, global API, etc.) and is linked to its documentation section.

A curated synonym table maps common aliases and variations. For example, v-model maps to two-way binding, and ref maps to reactivity primitives.

3. Contextual Enrichment

Each chunk receives a contextual prefix generated by Gemini that situates it within the broader documentation. This helps the embedding model understand chunks that would be ambiguous in isolation.

4. HyPE Question Generation

For key chunks, the system generates Hypothetical Previous Embeddings: synthetic developer questions that someone might ask when looking for this content. These are embedded alongside the original content as separate Qdrant points to improve recall at query time.

5. Embedding

All content is embedded using Jina's unified embedding model (jina-embeddings-v4). Dense embeddings and BM25 sparse vectors are generated for every chunk and stored in Qdrant.

6. Storage

Qdrant: Dense and sparse vectors for hybrid search
PostgreSQL: Entities, synonyms, page metadata, BM25 model, and index state

Query Pipeline

When your AI assistant calls vue_docs_search, the query goes through a 6-step pipeline:

Query -> Embed & Detect -> Hybrid Search -> Resolve HyPE -> Expand -> Rerank -> Reconstruct

1. Embed Query & Detect Entities

Three things happen in parallel:

Dense embedding. The query is embedded via Jina (jina-embeddings-v4) into the same vector space as the documentation.
Sparse vector. A BM25 sparse vector is generated locally for keyword matching.
Entity detection. The query is scanned for Vue API names using exact matching, bigram matching (e.g., "watch effect" matches watchEffect), synonym lookup, and fuzzy matching via rapidfuzz for typo tolerance.

All of this is deterministic except the Jina embedding call. No LLM is involved at query time.

2. Hybrid Search

The dense and sparse vectors are sent to Qdrant in a single hybrid search query, retrieving up to 50 candidates. Qdrant combines dense semantic similarity with BM25 keyword matching internally.

If the search scope is narrowed (e.g., scope: "guide/components") and returns no results, the pipeline automatically falls back to searching all scopes.

3. Resolve HyPE Hits

HyPE question chunks (synthetic questions generated during ingestion) are resolved back to their parent content chunks. If both a HyPE question and its parent appear in results, they are deduplicated, keeping the highest score.

4. Cross-Reference Expansion

Top results are checked for outgoing cross-references in their metadata. Referenced documentation sections are fetched from Qdrant and added as additional candidates:

High-priority cross-references (guide to API) are always followed
Medium-priority (same folder) are followed for the top 10 results
Low-priority (cross-folder) are followed for the top 5 results only

Only one hop is followed. There is no recursive expansion.

5. Reranking

All candidates are reranked by Jina's reranker model (jina-reranker-v3) for final precision. Hits scoring below -0.5 after reranking are discarded.

6. Reconstruction

Final results are sorted by their position in the documentation (by reading order, not by score) and reassembled into readable markdown fragments. Adjacent chunks from the same page are merged. The result reads like a coherent mini-document rather than a list of search snippets.

Cost

The hosted server is free. For self-hosted instances, per-query cost is approximately $0.0005, dominated by the Jina embedding and reranking calls. No LLM calls happen at query time.

How It Works ​

Ingestion Pipeline ​

1. Structure-Aware Parsing ​

2. Entity Extraction ​

3. Contextual Enrichment ​

4. HyPE Question Generation ​

5. Embedding ​

6. Storage ​

Query Pipeline ​

1. Embed Query & Detect Entities ​

2. Hybrid Search ​

3. Resolve HyPE Hits ​

4. Cross-Reference Expansion ​

5. Reranking ​

6. Reconstruction ​

Cost ​