top of page

MCP addition to RAG workflow project

  • Ken Munson
  • Mar 2
  • 3 min read

Updated: Mar 31


From RAG to MCP: Turning a Document Search System into a

Composable AI Capability.

Executive Summary

In this project, I extended an earlier Retrieval-Augmented Generation (RAG) system into a fully operational MCP (Model Context Protocol) server, exposing document search as a standardized, inspectable AI capability.

The result was not just a working tool — it was a deeper understanding of modern AI architecture: separating model reasoning from orchestration, retrieval, and execution.

This post documents:

  • The original RAG system (Vertex AI + FAISS)

  • Why MCP matters

  • How I wrapped RAG behind an MCP server

  • What I learned about orchestration, context, and AI systems design


At a super high level, here is how to think about MCP architecture:


Phase 1: Building a Production-Style RAG System

The original project implemented a structured RAG pipeline:

Architecture Overview

Documents → Chunking → Embeddings → FAISS Vector Store                                     ↓                              Query Embedding                                     ↓                           Similarity / MMR Search                                     ↓                           Context Injection → LLM

Stack

  • Embeddings: Vertex AI (text-embedding-004)

  • Vector store: FAISS (local persistent index)

  • Chunking strategy: 600–1000 token segments with overlap

  • Search modes: Similarity + MMR

  • Interface: CLI + optional FastAPI

What This Solved

  • Scalable retrieval from large PDFs

  • Context-aware answers grounded in source documents

  • Reduced hallucination risk

  • Full control over chunking, ranking, and injection

This project established something important:

The model does not “know” your documents. The application layer retrieves and injects relevant context.

That insight set up the next phase.

Phase 2: Understanding the Missing Layer — Orchestration

During the RAG build, a larger architectural realization emerged:

  • The LLM is stateless.

  • The LLM never accesses external systems directly.

  • The application/orchestration layer:

    • Maintains context

    • Decides when retrieval is needed

    • Calls tools

    • Injects results into the prompt

This led directly into MCP.

Phase 3: Wrapping RAG Behind MCP

Instead of treating RAG as an internal pipeline, I exposed it as a standardized capability using MCP.

What MCP Adds:

MCP is a protocol that allows AI clients to:

  • Discover tools

  • Read resources

  • Execute structured operations

  • Maintain strict boundaries between reasoning and execution

Instead of hard-wiring document search into an application, I defined:

MCP Resources

  • askyourdocs://statsReturns FAISS index metadata (vector count, mapped chunks)

MCP Tools

  • search_docs

    • Parameters:

      • query

      • k

      • method (similarity or MMR)

    • Returns structured search results

The RAG logic remained intact — but now it was accessible through a clean, inspectable protocol boundary.

The Architecture After MCP

User  ↓Application / Orchestration Layer  ↓LLM  ↓ (structured tool request)MCP Client  ↓MCP Server (my server.py)  ↓RAG Engine (FAISS + embeddings)  ↓Results injected back into LLM

Important:

  • The model never touches FAISS.

  • The model never executes search.

  • The model emits structured intent.

  • The application layer executes the request.

This separation is critical in enterprise environments.

What I Learned (The Real Value)

1. RAG Is About Knowledge

RAG solves:

  • Large unstructured documents or data (the new buzz word is corpora)

  • Context window limits

  • Scalable semantic recall

It is a retrieval strategy, not a protocol.

2. MCP Is About Capability Boundaries

MCP solves:

  • Standardized tool exposure

  • Clean separation of reasoning vs execution

  • Multi-tool composability

  • Auditable, controlled system integration

It is a control-plane protocol, not a retrieval engine.

3. The LLM Is Not the System

This project reinforced something fundamental:

The LLM is a reasoning engine. The application layer is the system.

The intelligence of the overall solution depends more on:

  • Context construction

  • Tool exposure

  • Retrieval strategy

  • Guardrails

  • Orchestration logic

than on the raw model itself.

4. MCP Does Not Replace RAG

At first glance, it might seem MCP makes RAG unnecessary. It does not.

  • MCP standardizes how tools are called

  • RAG determines what knowledge is retrieved

In practice, MCP often wraps RAG.

Operational Lessons

Dev Mode vs Production Mode

The MCP Inspector:

  • Launches the server

  • Proxies stdio traffic

  • Allows manual tool invocation

  • Is a development and validation client

It is not part of production runtime.

Structured Tool Calls

The model does not execute search.

It emits:

{  "tool": "search_docs",  "arguments": { ... }}

The orchestration layer:

  • Parses this

  • Executes the tool

  • Injects results into the model prompt

The model only ever sees text.

Why This Matters

This project demonstrates competency in:

  • Vector-based RAG design

  • Embedding workflows

  • Chunking strategy

  • Tool schema design

  • MCP protocol integration

  • Context orchestration

  • AI systems architecture

More importantly, it shows understanding of:

  • Separation of concerns

  • Capability boundaries

  • Model vs control-plane architecture

  • Secure tool invocation patterns

This is not just “using AI. ”This is building AI systems correctly.

Final Reflection

The most valuable realization from this project was not technical — it was architectural:

Models reason. Applications orchestrate. Protocols expose capabilities. Retrieval selects knowledge.

Once this framework clicked, MCP, RAG, and tool use all became coherent parts of the same system.

Comments


bottom of page