{"id":1294,"date":"2025-10-10T07:04:06","date_gmt":"2025-10-10T07:04:06","guid":{"rendered":"https:\/\/www.ucssolutions.com\/blog\/?p=1294"},"modified":"2025-10-10T07:14:29","modified_gmt":"2025-10-10T07:14:29","slug":"building-a-production-ready-private-rag-system","status":"publish","type":"post","link":"https:\/\/www.ucssolutions.com\/blog\/building-a-production-ready-private-rag-system\/","title":{"rendered":"Building a Production-Ready Private RAG System"},"content":{"rendered":"\n<p id=\"272ae8cf-eb88-8022-8c38-e7de7ae47b1f\">Every powerful software product is backed by extensive knowledge. For a product like <strong>SalesWorx<\/strong>, this knowledge is codified in detailed user guides, technical manuals, and implementation documents covering everything from commission structures to complex trade deal rules. This information is invaluable, but it&#8217;s often trapped in static PDF and Markdown files, making it difficult for customer support, implementation specialists, and even sales teams to find precise, contextual answers quickly.<\/p>\n\n\n\n<p id=\"272ae8cf-eb88-80d7-bf47-e7360cd02725\">Standard keyword search is no longer enough. It lacks the semantic understanding to answer complex questions like, <em>&#8220;How do the multi-transaction bonus rules in SalesWorx impact the commission payout slabs for sales reps?&#8221;<\/em><\/p>\n\n\n\n<p id=\"272ae8cf-eb88-8082-90c0-deefce6b1fd9\">This is where a <strong>Retrieval-Augmented Generation (RAG)<\/strong> system comes in. But while many tutorials show how to build a basic RAG prototype, they often fall short in a real-world production environment. They are brittle, struggle with complex queries, and lack the robustness required for business-critical applications.<\/p>\n\n\n\n<p id=\"272ae8cf-eb88-8072-9f32-d7aa3e3893de\">This guide is different. We will walk you through the complete, end-to-end process of building a <strong>production-ready RAG system<\/strong>. We&#8217;ll cover the architecture, the detailed workflow, the full setup and implementation, and the final output, using the official <strong>SalesWorx product documentation<\/strong> as our real-world knowledge base.<\/p>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"272ae8cf-eb88-80c0-bf4d-c9462e45202f\"><strong>&nbsp;Features<\/strong><\/h2>\n\n\n\n<ul id=\"272ae8cf-eb88-80da-86cf-f9c13f605048\" class=\"wp-block-list\">\n<li><strong>Privacy-Focused:<\/strong>\u00a0Local processing ensures data security without cloud dependency.<\/li>\n\n\n\n<li><strong>Library-First Ingestion Pipeline:<\/strong>\u00a0LlamaIndex\u00a0<code>IngestionPipeline<\/code>\u00a0orchestrates Unstructured parsing, deterministic hashing, DuckDB caching, and AES-GCM page image handling with OpenTelemetry spans for each run.<\/li>\n\n\n\n<li><strong>Versatile Document Handling:<\/strong>\u00a0Supports multiple file formats:\n<ul class=\"wp-block-list\">\n<li>\ud83d\udcc4 PDF<\/li>\n\n\n\n<li>\ud83d\udcd1 DOCX<\/li>\n\n\n\n<li>\ud83d\udcdd TXT<\/li>\n\n\n\n<li>\ud83d\udcca XLSX<\/li>\n\n\n\n<li>\ud83c\udf10 MD (Markdown)<\/li>\n\n\n\n<li>\ud83d\uddc3\ufe0f JSON<\/li>\n\n\n\n<li>\ud83d\uddc2\ufe0f XML<\/li>\n\n\n\n<li>\ud83d\udd24 RTF<\/li>\n\n\n\n<li>\ud83d\udcc7 CSV<\/li>\n\n\n\n<li>\ud83d\udce7 MSG (Email)<\/li>\n\n\n\n<li>\ud83d\udda5\ufe0f PPTX (PowerPoint)<\/li>\n\n\n\n<li>\ud83d\udcd8 ODT (OpenDocument Text)<\/li>\n\n\n\n<li>\ud83d\udcda EPUB (E-book)<\/li>\n\n\n\n<li>\ud83d\udcbb Code files (PY, JS, JAVA, TS, TSX, C, CPP, H, and more)<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Multi-Agent Coordination:<\/strong>\u00a0LangGraph supervisor coordinating 5 specialized agents: query router, query planner, retrieval expert, result synthesizer, and response validator.<\/li>\n\n\n\n<li><strong>Retrieval\/Router:<\/strong>\u00a0RouterQueryEngine composed via\u00a0<code>router_factory<\/code>\u00a0with tools\u00a0<code>semantic_search<\/code>,\u00a0<code>hybrid_search<\/code>\u00a0(Qdrant server\u2011side fusion), and optional\u00a0<code>knowledge_graph<\/code>; uses async\/batching where appropriate.<\/li>\n\n\n\n<li><strong>Hybrid Retrieval:<\/strong>\u00a0Qdrant Query API server\u2011side fusion (RRF default, DBSF optional) over named vectors\u00a0<code>text-dense<\/code>\u00a0(BGE\u2011M3; COSINE) and\u00a0<code>text-sparse<\/code>\u00a0(FastEmbed BM42\/BM25 with IDF). Dense via LlamaIndex; sparse via FastEmbed.<\/li>\n\n\n\n<li><strong>Knowledge Graph (optional):<\/strong>\u00a0Adds a\u00a0<code>knowledge_graph<\/code>\u00a0router tool when a PropertyGraphIndex is present and healthy; uses spaCy entity extraction; selector prefers\u00a0<code>PydanticSingleSelector<\/code>\u00a0then\u00a0<code>LLMSingleSelector<\/code>; falls back to vector\/hybrid when absent.<\/li>\n\n\n\n<li><strong>Multimodal Processing:<\/strong>\u00a0Unstructured hi\u2011res parsing for PDFs with text, tables, and images; visual features scored with SigLIP by default (CLIP optional).<\/li>\n\n\n\n<li><strong>Always-on Reranking:<\/strong>\u00a0Text via BGE Cross-Encoder and visual via SigLIP; optional ColPali on capable GPUs. Deterministic, batch\u2011wise cancellation; fail\u2011open; SigLIP loader cached.<\/li>\n\n\n\n<li><strong>Offline-First Design:<\/strong>\u00a0100% local processing with no external API dependencies.<\/li>\n\n\n\n<li><strong>GPU Acceleration:<\/strong>\u00a0CUDA support with mixed precision and FP8 quantization via vLLM FlashInfer backend for optimized performance.<\/li>\n\n\n\n<li><strong>Session Persistence:<\/strong>\u00a0SQLite WAL with local multi-process support for concurrent access.<\/li>\n\n\n\n<li><strong>Docker Support:<\/strong>\u00a0Easy deployment with Docker and Docker Compose.<\/li>\n\n\n\n<li><strong>Intelligent Caching:<\/strong>\u00a0High-performance document processing cache for rapid re-analysis.<\/li>\n\n\n\n<li><strong>Robust Error Handling:<\/strong>\u00a0Reliable retry strategies with exponential backoff.<\/li>\n\n\n\n<li><strong>Structured Logging:<\/strong>\u00a0Contextual logging with automatic rotation and JSON output.<\/li>\n\n\n\n<li><strong>Encrypted Page Images (AES-GCM):<\/strong>\u00a0Optional at-rest encryption for rendered PDF page images using AES-GCM with KID as AAD;\u00a0<code>.enc<\/code>\u00a0files are decrypted just-in-time for visual scoring and immediately cleaned up.<\/li>\n\n\n\n<li><strong>Simple Configuration:<\/strong>\u00a0Environment variables and Streamlit native config for easy setup.<\/li>\n<\/ul>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"272ae8cf-eb88-804d-a029-c5f36a69cafc\"><strong>The Core Problem: Why Basic RAG Fails in Production<\/strong><\/h2>\n\n\n\n<p id=\"272ae8cf-eb88-8085-842f-d1b873a5026d\">A simple RAG system typically follows a two-step process: retrieve relevant text chunks and feed them to a Large Language Model (LLM) to generate an answer. This works for simple questions but breaks down when faced with real-world complexity:<\/p>\n\n\n\n<ul id=\"272ae8cf-eb88-80bf-b5c0-e8ec4f4880c6\" class=\"wp-block-list\">\n<li><strong>Poor Retrieval Accuracy:<\/strong> A simple vector search might miss documents that use different terminology for the same concept or fail to find documents that require both keyword and semantic matching.<\/li>\n\n\n\n<li><strong>Context-Blindness:<\/strong> It can&#8217;t handle multi-part questions that require synthesizing information from different documents.<\/li>\n\n\n\n<li><strong>Lack of Scalability:<\/strong> The architecture isn&#8217;t designed for a growing knowledge base or concurrent users.<\/li>\n\n\n\n<li><strong>No Robustness:<\/strong> It lacks proper error handling, monitoring, and the ability to trace how an answer was generated.<\/li>\n<\/ul>\n\n\n\n<p id=\"272ae8cf-eb88-801f-92c6-f84f3c62de62\">A production system must overcome all these challenges.<\/p>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"272ae8cf-eb88-806a-a341-ff4e5e2a4430\"><strong>Architectural Deep Dive: The Anatomy of a Production RAG System<\/strong><\/h2>\n\n\n\n<p id=\"272ae8cf-eb88-80a1-a68d-ca4ad56bae13\">Our system is built for robustness and intelligence. It&#8217;s divided into two main workflows: an <strong>offline Ingestion Workflow<\/strong> to process and index knowledge, and a <strong>real-time Query Workflow<\/strong> to intelligently answer user questions.<\/p>\n\n\n\n<p id=\"272ae8cf-eb88-804e-9c5f-cecacb1527bb\">This system combines hybrid search (dense + sparse embeddings), knowledge graph extraction, and a 5-agent coordination system to extract and analyze information from your PDFs, Office docs, and multimedia content. Built on LlamaIndex pipelines with LangGraph supervisor orchestration and <code>Qwen3-4B-Instruct-2507<\/code>&#8216;s FULL 262K context capability through INT8 KV cache optimization, it provides document intelligence that runs entirely on your hardware\u2014with GPU acceleration and agent coordination.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\" id=\"272ae8cf-eb88-8060-8b03-e4ca1267f971\"><img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"1024\" src=\"https:\/\/www.ucssolutions.com\/blog\/wp-content\/uploads\/2025\/10\/Screenshot_2025-09-18_at_11.17.10-768x1024.png\" alt=\"\" class=\"wp-image-1304\" srcset=\"https:\/\/www.ucssolutions.com\/blog\/wp-content\/uploads\/2025\/10\/Screenshot_2025-09-18_at_11.17.10-768x1024.png 768w, https:\/\/www.ucssolutions.com\/blog\/wp-content\/uploads\/2025\/10\/Screenshot_2025-09-18_at_11.17.10-225x300.png 225w, https:\/\/www.ucssolutions.com\/blog\/wp-content\/uploads\/2025\/10\/Screenshot_2025-09-18_at_11.17.10.png 1066w\" sizes=\"(max-width: 768px) 100vw, 768px\" \/><\/figure>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"272ae8cf-eb88-80e2-9c82-ed5f91d576ed\"><strong>Phase 1: The Ingestion Workflow (Building the Knowledge Foundation)<\/strong><\/h3>\n\n\n\n<p id=\"272ae8cf-eb88-807c-9c19-e544c8f583c7\">This is the crucial first step where we convert our raw product manuals into a structured, queryable knowledge base.<\/p>\n\n\n\n<ol start=\"1\" id=\"272ae8cf-eb88-801f-8fad-c31a626ef081\" class=\"wp-block-list\">\n<li><strong>Data Sourcing &amp; Parsing:<\/strong> We start with the official <strong>SalesWorx product documentation<\/strong>. Instead of just reading text, we use the <code>unstructured<\/code> library to parse these files. This intelligently extracts text, tables, and titles, preserving the document&#8217;s original structure.<\/li>\n\n\n\n<li><strong>Strategic Chunking:<\/strong> We employ a title-based chunking strategy. This is superior to fixed-size chunks because it keeps related paragraphs grouped under their original headings, maintaining vital semantic context.<\/li>\n\n\n\n<li><strong>Hybrid Embeddings (BGE-M3):<\/strong> Each chunk is transformed into numerical vectors. We use the <code>BGE-M3<\/code> model, which is exceptional because it generates both <strong>dense vectors<\/strong> (capturing semantic meaning, e.g., &#8220;incentive&#8221; is similar to &#8220;bonus&#8221;) and <strong>sparse vectors<\/strong> (capturing keyword relevance) in a single pass. This is the foundation of our advanced hybrid search.<\/li>\n\n\n\n<li><strong>Vector Storage (Qdrant):<\/strong> The chunks and their vectors are stored in <strong>Qdrant<\/strong>, a production-grade vector database. We chose Qdrant for its key features:\n<ul class=\"wp-block-list\">\n<li><strong>Named Vectors:<\/strong> It can store both our dense and sparse vectors for every single data point.<\/li>\n\n\n\n<li><strong>Server-Side Fusion:<\/strong> It can combine search results from both vector types on the server using <strong>Reciprocal Rank Fusion (RRF)<\/strong>, which is highly efficient and improves retrieval accuracy.<\/li>\n\n\n\n<li><strong>Scalability &amp; Performance:<\/strong> Built in Rust, it&#8217;s incredibly fast and can be deployed in a distributed cluster.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"272ae8cf-eb88-8006-b391-fcc09391aa65\"><strong>Phase 2: The Query Workflow (The Multi-Agent &#8220;Brain&#8221;)<\/strong><\/h3>\n\n\n\n<p id=\"272ae8cf-eb88-8029-ad33-dd7b8b2af168\">When a user asks a question, a sophisticated, real-time process is initiated, managed by a team of five specialized AI agents orchestrated by <strong>LangGraph<\/strong>.<\/p>\n\n\n\n<p id=\"272ae8cf-eb88-80ab-b70b-ebc66d0613ac\">Let&#8217;s trace a complex query: <em>&#8220;Summarize the &#8216;Simple Bonus by Item Quantity&#8217; feature in SalesWorx and explain how it differs from the &#8216;Assortment Bonus by Overall Quantity&#8217;.&#8221;<\/em><\/p>\n\n\n\n<ol start=\"1\" id=\"272ae8cf-eb88-80a7-b43a-fd7835fd3bfd\" class=\"wp-block-list\">\n<li><strong>Query Router:<\/strong> The query first hits this agent. It analyzes the structure and identifies two distinct parts: one about &#8220;Simple Bonus&#8221; and another about &#8220;Assortment Bonus.&#8221; It flags the query as complex and requiring a multi-step plan.<\/li>\n\n\n\n<li><strong>Query Planner:<\/strong> This agent creates a logical plan:\n<ul class=\"wp-block-list\">\n<li><strong>Task 1:<\/strong> Retrieve information specifically about the &#8220;Simple Bonus by Item Quantity&#8221; feature, including its types (&#8220;Point&#8221; and &#8220;Recurring&#8221;).<\/li>\n\n\n\n<li><strong>Task 2:<\/strong> Retrieve information about the &#8220;Assortment Bonus by Overall Quantity&#8221; feature.<\/li>\n\n\n\n<li><strong>Task 3:<\/strong> Synthesize the retrieved information, focusing on the key differences between the two features.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Retrieval Expert:<\/strong> This agent executes the plan. For each task, it performs a <strong>hybrid search<\/strong> on Qdrant, looking for chunks that are both semantically similar and contain relevant keywords. The results are then passed to a <strong>reranker model<\/strong> to push the most accurate chunks to the very top.<\/li>\n\n\n\n<li><strong>Result Synthesizer:<\/strong> This agent takes the curated context from both retrieval tasks and combines them into a single, coherent block of information.<\/li>\n\n\n\n<li><strong>Response Validator:<\/strong> Before the final answer is generated, this agent performs a quality check. Does the synthesized context accurately address both parts of the original query? Only after this validation is the context passed to the LLM to generate the final, human-readable answer.<\/li>\n<\/ol>\n\n\n\n<p id=\"272ae8cf-eb88-802a-83f4-f10e47d42cbc\">This structured, agent-based workflow allows our system to deconstruct and answer complex questions about the SalesWorx product with a level of precision that a simple RAG system cannot match.<\/p>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"272ae8cf-eb88-80b2-8651-e043b4630858\"><strong>Step-by-Step Implementation: Let&#8217;s Build It<\/strong><\/h4>\n\n\n\n<p id=\"272ae8cf-eb88-80b6-a276-c410ff5f9642\">Here\u2019s how you can set up and run this entire system on your local machine.<\/p>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"272ae8cf-eb88-8031-b1b9-e12006825f87\"><strong>Step 0: Prerequisites<\/strong><\/h4>\n\n\n\n<ul id=\"272ae8cf-eb88-80c6-a03c-cc6f02e84215\" class=\"wp-block-list\">\n<li><strong>Docker &amp; Docker Compose:<\/strong> To run our vector database.<\/li>\n\n\n\n<li><strong>Python 3.11+<\/strong> and the <code>uv<\/code> package manager.<\/li>\n\n\n\n<li><strong>Ollama:<\/strong> For running a local LLM. Install it and pull a model:<code>ollama pull qwen3-4b-instruct-2507<\/code><\/li>\n<\/ul>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"272ae8cf-eb88-80f5-9c04-d74579b77b4d\"><strong>Step 1: Project Setup<\/strong><\/h4>\n\n\n\n<p id=\"272ae8cf-eb88-80df-8056-f04691480b04\">First, get the project code and set up the Python environment.<\/p>\n\n\n\n<pre id=\"272ae8cf-eb88-8084-8647-eae10e8027dd\" class=\"wp-block-code\"><code># Clone the repository (replace with your actual repo URL)\ngit clone https:\/\/github.com\/your-repo\/production-rag-salesworx.git\ncd production-rag-salesworx\n\n# Install all dependencies from pyproject.toml\nuv sync\n\n# Create your environment configuration file\ncp .env.example .env\n```Make sure your `.env` file points to your local Ollama instance and sets the correct model name.\n\n#### Step 2: Launching Core Services with Docker\n\nWe use Docker Compose to manage our Qdrant instance. The `docker-compose.yml` file defines the service.\n\n```yaml\n# docker-compose.yml\nversion: '3.8'\nservices:\n  qdrant:\n    image: qdrant\/qdrant:latest\n    ports:\n      - \"6333:6333\"\n      - \"6334:6334\"\n    volumes:\n      - .\/qdrant_storage:\/qdrant\/storage\n<\/code><\/pre>\n\n\n\n<p id=\"272ae8cf-eb88-8022-940c-f8f0993ee44f\">Launch the service:<\/p>\n\n\n\n<pre id=\"272ae8cf-eb88-807e-a6ab-d38586b6f172\" class=\"wp-block-code\"><code>docker-compose up -d<\/code><\/pre>\n\n\n\n<p id=\"272ae8cf-eb88-8093-89b5-fb8ec37cede0\">You can verify that Qdrant is running by visiting <code>http:\/\/localhost:6333\/dashboard<\/code> in your browser.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\" id=\"272ae8cf-eb88-800c-879a-faa2b45af77b\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"475\" src=\"https:\/\/www.ucssolutions.com\/blog\/wp-content\/uploads\/2025\/10\/Screenshot_2025-09-18_at_11.30.58-1024x475.png\" alt=\"\" class=\"wp-image-1302\" srcset=\"https:\/\/www.ucssolutions.com\/blog\/wp-content\/uploads\/2025\/10\/Screenshot_2025-09-18_at_11.30.58-1024x475.png 1024w, https:\/\/www.ucssolutions.com\/blog\/wp-content\/uploads\/2025\/10\/Screenshot_2025-09-18_at_11.30.58-300x139.png 300w, https:\/\/www.ucssolutions.com\/blog\/wp-content\/uploads\/2025\/10\/Screenshot_2025-09-18_at_11.30.58-768x356.png 768w, https:\/\/www.ucssolutions.com\/blog\/wp-content\/uploads\/2025\/10\/Screenshot_2025-09-18_at_11.30.58-1536x712.png 1536w, https:\/\/www.ucssolutions.com\/blog\/wp-content\/uploads\/2025\/10\/Screenshot_2025-09-18_at_11.30.58-2048x949.png 2048w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"272ae8cf-eb88-8086-93a3-c2718336c84d\"><strong>Step 3: The Ingestion Process (Code and Execution)<\/strong><\/h4>\n\n\n\n<p id=\"272ae8cf-eb88-8094-95cc-c5dedf776d13\">Create a folder named <code>data<\/code> and place your SalesWorx product documentation files inside it. Now, create an <code>ingest.py<\/code> script to process these files and load them into Qdrant.<\/p>\n\n\n\n<pre id=\"272ae8cf-eb88-80df-bbb9-f5e4413510c6\" class=\"wp-block-code\"><code># ingest.py\nimport asyncio\nfrom pathlib import Path\nfrom src.config import settings\nfrom src.utils.document import load_documents_unstructured\nfrom src.utils.embedding import create_index_async\n\nasync def main():\n    \"\"\"\n    Main function to run the ingestion pipeline.\n    It processes all supported product documents in the 'data' folder and indexes them.\n    \"\"\"\n    print(\"Starting the ingestion process...\")\n    data_folder = Path(\".\/data\")\n\n    supported_extensions = {'.md', '.pdf'}\n    documents_paths = &#91;\n        f for f in data_folder.rglob(\"*\")\n        if f.suffix.lower() in supported_extensions\n    ]\n\n    if not documents_paths:\n        print(\"No supported documents found in the '.\/data' folder.\")\n        return\n\n    print(f\"Found {len(documents_paths)} documents to process...\")\n\n    # 1. Load and parse documents using the Unstructured library\n    documents = await load_documents_unstructured(documents_paths, settings)\n\n    # 2. Create embeddings and store them in Qdrant\n    # This function handles chunking, embedding, and indexing\n    await create_index_async(documents, settings)\n\n    print(\"\u2705 Ingestion complete. The SalesWorx knowledge base is ready.\")\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n<\/code><\/pre>\n\n\n\n<p id=\"272ae8cf-eb88-8047-9bad-c509fa9f15c1\">Run the script from your terminal:<\/p>\n\n\n\n<pre id=\"272ae8cf-eb88-800b-9902-e009063012d6\" class=\"wp-block-code\"><code>uv run python ingest.py```\nYou will see log messages indicating the progress, and upon completion, your knowledge base will be indexed and ready in Qdrant.\n\n#### Step 4: The Query Process (Code and Execution)\n\nNow for the exciting part: asking questions. Create a `query.py` script to interact with our multi-agent system.\n\n```python\n# query.py\nimport asyncio\nimport sys\nfrom src.agents.coordinator import MultiAgentCoordinator\n\nasync def main(user_query: str):\n    \"\"\"\n    Initializes the multi-agent coordinator and processes a user query.\n    \"\"\"\n    print(f\"Processing query: '{user_query}'\")\n\n    # Initialize the 5-agent system\n    coordinator = MultiAgentCoordinator()\n\n    # Process the query and get the response\n    response = coordinator.process_query(user_query, context=None)\n\n    print(\"\\\\n--- \ud83e\udd16 Answer ---\")\n    print(response)\n    print(\"--------------------\")\n\nif __name__ == \"__main__\":\n    if len(sys.argv) &gt; 1:\n        query = \" \".join(sys.argv&#91;1:])\n        asyncio.run(main(query))\n    else:\n        print(\"Usage: uv run python query.py &lt;your question here&gt;\")\n<\/code><\/pre>\n\n\n\n<p id=\"272ae8cf-eb88-8024-b461-c819db56a504\">Now, you can ask questions about the SalesWorx product directly from your command line.<\/p>\n\n\n\n<p><strong>Putting It All Together: The Final Output<\/strong><\/p>\n\n\n\n<p id=\"272ae8cf-eb88-80eb-90f2-c073a77135ad\">Let&#8217;s run our <code>query.py<\/code> script with the same questions we used as examples.<\/p>\n\n\n\n<p id=\"272ae8cf-eb88-80bd-8d53-ce5848b10fe7\"><strong>Query 1: A direct, comparative question<\/strong><\/p>\n\n\n\n<pre id=\"272ae8cf-eb88-80a9-af5c-c3bb2ba9ad24\" class=\"wp-block-code\"><code>uv run python query.py \"Summarize the 'Simple Bonus by Item Quantity' feature in SalesWorx and explain how it differs from the 'Assortment Bonus by Overall Quantity'.\"\n<\/code><\/pre>\n\n\n\n<p id=\"272ae8cf-eb88-8093-bd7e-fad835bfdb04\"><strong>Expected Output:<\/strong><\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>&#8212; \ud83e\udd16 Answer &#8212;<br>The key difference between the &#8216;Simple Bonus by Item Quantity&#8217; and &#8216;Assortment Bonus by Overall Quantity&#8217; features in SalesWorx lies in the items a customer must purchase to qualify for the bonus.<\/p>\n\n\n\n<ol start=\"1\" id=\"272ae8cf-eb88-8074-8eeb-ccdd87ac606d\" class=\"wp-block-list\">\n<li><strong>Simple Bonus by Item Quantity:<\/strong> This bonus is tied to the purchase of a <em>single, specific item<\/em>. A customer must buy a certain quantity of &#8216;Product A&#8217; to receive a Free of Charge (FOC) item. The bonus can be &#8216;Point&#8217; (a one-time reward for buying within a quantity range) or &#8216;Recurring&#8217; (a reward that multiplies as the purchase quantity increases).<\/li>\n\n\n\n<li><strong>Assortment Bonus by Overall Quantity:<\/strong> This bonus is more flexible. A customer can qualify by purchasing a combination of <em>different items<\/em> from a predefined &#8216;Order Items List&#8217;. The system looks at the total quantity of all eligible items in the cart. For example, a customer could buy 3 units of Product A, 4 units of Product B, and 3 units of Product C to reach a total of 10 units and qualify for the bonus associated with that quantity slab.<\/li>\n<\/ol>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"272ae8cf-eb88-80af-a878-e21edc6b8a88\">In short, the Simple Bonus feature is for promoting a single product, while the Assortment Bonus feature is for encouraging the purchase of a range of different products.<\/h4>\n<\/blockquote>\n\n\n\n<p id=\"272ae8cf-eb88-808d-95ff-d729a2b189ab\"><strong>Query 2: A step-by-step process question<\/strong><\/p>\n\n\n\n<pre id=\"272ae8cf-eb88-802d-9159-d641bbddf836\" class=\"wp-block-code\"><code>uv run python query.py \"How do I set up a multi-transaction bonus rule in SalesWorx?\"<\/code><\/pre>\n\n\n\n<p id=\"272ae8cf-eb88-8009-a630-f3230dde894a\"><strong>Expected Output:<\/strong><\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>&#8212; \ud83e\udd16 Answer &#8212;<br>To set up a multi-transaction bonus rule in the SalesWorx product, follow these steps:<\/p>\n\n\n\n<ol start=\"1\" id=\"272ae8cf-eb88-80ed-b287-f3c038de2e5c\" class=\"wp-block-list\">\n<li><strong>Navigate to the Menu:<\/strong> From the main menu, go to <code>Product Management<\/code> -> <code>Bonus\/Discount Management<\/code>.<\/li>\n\n\n\n<li><strong>Select the Rule Type:<\/strong> Click on <code>Multi Transaction Bonus Rules<\/code>.<\/li>\n\n\n\n<li><strong>Choose Organization:<\/strong> Select the organization (e.g., Site &#8211; 2100) for which you are creating the deal.<\/li>\n\n\n\n<li><strong>Add a New Rule:<\/strong> Click the <code>Add<\/code> button.<\/li>\n\n\n\n<li><strong>Define the Rule:<\/strong>\n<ul class=\"wp-block-list\">\n<li><strong>Sales Item:<\/strong> Select the product the customer needs to purchase over time (e.g., &#8216;200082-Al Ain 5 Gallon&#8217;).<\/li>\n\n\n\n<li><strong>Sales Qty:<\/strong> Enter the total quantity the customer must achieve across multiple invoices (e.g., 100).<\/li>\n\n\n\n<li><strong>Promo Item:<\/strong> Select the item that will be given as a bonus.<\/li>\n\n\n\n<li><strong>Promo Qty:<\/strong> Enter the quantity of the bonus item to be awarded.<\/li>\n\n\n\n<li><strong>Validity:<\/strong> Set the &#8216;Valid From&#8217; and &#8216;Valid To&#8217; dates for the promotion period.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Save:<\/strong> Click <code>OK<\/code> to save and activate the trade deal. The system will then track customer purchases of the sales item between the valid dates.<\/li>\n<\/ol>\n<\/blockquote>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"272ae8cf-eb88-80a1-9b66-c7aa49349451\"><strong>Conclusion<\/strong><\/h2>\n\n\n\n<p id=\"272ae8cf-eb88-80e4-98bc-d02709d28d5e\">By moving beyond basic prototypes and embracing a production-ready architecture, we have transformed the static <strong>SalesWorx product documentation<\/strong> into a dynamic and intelligent knowledge base. This system, built with a robust ingestion pipeline, a scalable Qdrant vector database, and a sophisticated multi-agent workflow, can understand and answer complex, real-world questions about the product&#8217;s features with remarkable accuracy.<\/p>\n\n\n\n<p id=\"272ae8cf-eb88-806a-be50-fd51eba4f545\">This approach doesn&#8217;t just make information accessible; it makes it actionable, empowering your teams to make faster, more informed decisions.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Every powerful software product is backed by extensive knowledge. For a product like SalesWorx, this knowledge is codified in detailed user guides, technical manuals, and implementation documents covering everything from commission structures to complex trade deal rules. This information is invaluable, but it&#8217;s often trapped in static PDF and Markdown files, making it difficult for [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-1294","post","type-post","status-publish","format-standard","hentry","category-blog"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v23.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Building a Production-Ready Private RAG System - Unique Computer Systems<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.ucssolutions.com\/blog\/building-a-production-ready-private-rag-system\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Building a Production-Ready Private RAG System - Unique Computer Systems\" \/>\n<meta property=\"og:description\" content=\"Every powerful software product is backed by extensive knowledge. For a product like SalesWorx, this knowledge is codified in detailed user guides, technical manuals, and implementation documents covering everything from commission structures to complex trade deal rules. This information is invaluable, but it&#8217;s often trapped in static PDF and Markdown files, making it difficult for [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.ucssolutions.com\/blog\/building-a-production-ready-private-rag-system\/\" \/>\n<meta property=\"og:site_name\" content=\"Unique Computer Systems\" \/>\n<meta property=\"article:published_time\" content=\"2025-10-10T07:04:06+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-10-10T07:14:29+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.ucssolutions.com\/blog\/wp-content\/uploads\/2025\/10\/Screenshot_2025-09-18_at_11.17.10-768x1024.png\" \/>\n<meta name=\"author\" content=\"ucs_admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"ucs_admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.ucssolutions.com\/blog\/building-a-production-ready-private-rag-system\/\",\"url\":\"https:\/\/www.ucssolutions.com\/blog\/building-a-production-ready-private-rag-system\/\",\"name\":\"Building a Production-Ready Private RAG System - Unique Computer Systems\",\"isPartOf\":{\"@id\":\"https:\/\/www.ucssolutions.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.ucssolutions.com\/blog\/building-a-production-ready-private-rag-system\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.ucssolutions.com\/blog\/building-a-production-ready-private-rag-system\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.ucssolutions.com\/blog\/wp-content\/uploads\/2025\/10\/Screenshot_2025-09-18_at_11.17.10-768x1024.png\",\"datePublished\":\"2025-10-10T07:04:06+00:00\",\"dateModified\":\"2025-10-10T07:14:29+00:00\",\"author\":{\"@id\":\"https:\/\/www.ucssolutions.com\/blog\/#\/schema\/person\/7386c7cca4b130fa0e54206971ddcd5b\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.ucssolutions.com\/blog\/building-a-production-ready-private-rag-system\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.ucssolutions.com\/blog\/building-a-production-ready-private-rag-system\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.ucssolutions.com\/blog\/building-a-production-ready-private-rag-system\/#primaryimage\",\"url\":\"https:\/\/www.ucssolutions.com\/blog\/wp-content\/uploads\/2025\/10\/Screenshot_2025-09-18_at_11.17.10.png\",\"contentUrl\":\"https:\/\/www.ucssolutions.com\/blog\/wp-content\/uploads\/2025\/10\/Screenshot_2025-09-18_at_11.17.10.png\",\"width\":1066,\"height\":1422},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.ucssolutions.com\/blog\/building-a-production-ready-private-rag-system\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.ucssolutions.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Building a Production-Ready Private RAG System\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.ucssolutions.com\/blog\/#website\",\"url\":\"https:\/\/www.ucssolutions.com\/blog\/\",\"name\":\"Unique Computer Systems\",\"description\":\"Technology Blog learn about technology updates and tech innovations in Sharjah, Dubai, Abu Dhabi, UAE and the world.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.ucssolutions.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.ucssolutions.com\/blog\/#\/schema\/person\/7386c7cca4b130fa0e54206971ddcd5b\",\"name\":\"ucs_admin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.ucssolutions.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/4db329918bcfa8f657c30c839798cc77?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/4db329918bcfa8f657c30c839798cc77?s=96&d=mm&r=g\",\"caption\":\"ucs_admin\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Building a Production-Ready Private RAG System - Unique Computer Systems","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.ucssolutions.com\/blog\/building-a-production-ready-private-rag-system\/","og_locale":"en_US","og_type":"article","og_title":"Building a Production-Ready Private RAG System - Unique Computer Systems","og_description":"Every powerful software product is backed by extensive knowledge. For a product like SalesWorx, this knowledge is codified in detailed user guides, technical manuals, and implementation documents covering everything from commission structures to complex trade deal rules. This information is invaluable, but it&#8217;s often trapped in static PDF and Markdown files, making it difficult for [&hellip;]","og_url":"https:\/\/www.ucssolutions.com\/blog\/building-a-production-ready-private-rag-system\/","og_site_name":"Unique Computer Systems","article_published_time":"2025-10-10T07:04:06+00:00","article_modified_time":"2025-10-10T07:14:29+00:00","og_image":[{"url":"https:\/\/www.ucssolutions.com\/blog\/wp-content\/uploads\/2025\/10\/Screenshot_2025-09-18_at_11.17.10-768x1024.png"}],"author":"ucs_admin","twitter_card":"summary_large_image","twitter_misc":{"Written by":"ucs_admin","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.ucssolutions.com\/blog\/building-a-production-ready-private-rag-system\/","url":"https:\/\/www.ucssolutions.com\/blog\/building-a-production-ready-private-rag-system\/","name":"Building a Production-Ready Private RAG System - Unique Computer Systems","isPartOf":{"@id":"https:\/\/www.ucssolutions.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.ucssolutions.com\/blog\/building-a-production-ready-private-rag-system\/#primaryimage"},"image":{"@id":"https:\/\/www.ucssolutions.com\/blog\/building-a-production-ready-private-rag-system\/#primaryimage"},"thumbnailUrl":"https:\/\/www.ucssolutions.com\/blog\/wp-content\/uploads\/2025\/10\/Screenshot_2025-09-18_at_11.17.10-768x1024.png","datePublished":"2025-10-10T07:04:06+00:00","dateModified":"2025-10-10T07:14:29+00:00","author":{"@id":"https:\/\/www.ucssolutions.com\/blog\/#\/schema\/person\/7386c7cca4b130fa0e54206971ddcd5b"},"breadcrumb":{"@id":"https:\/\/www.ucssolutions.com\/blog\/building-a-production-ready-private-rag-system\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.ucssolutions.com\/blog\/building-a-production-ready-private-rag-system\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.ucssolutions.com\/blog\/building-a-production-ready-private-rag-system\/#primaryimage","url":"https:\/\/www.ucssolutions.com\/blog\/wp-content\/uploads\/2025\/10\/Screenshot_2025-09-18_at_11.17.10.png","contentUrl":"https:\/\/www.ucssolutions.com\/blog\/wp-content\/uploads\/2025\/10\/Screenshot_2025-09-18_at_11.17.10.png","width":1066,"height":1422},{"@type":"BreadcrumbList","@id":"https:\/\/www.ucssolutions.com\/blog\/building-a-production-ready-private-rag-system\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.ucssolutions.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Building a Production-Ready Private RAG System"}]},{"@type":"WebSite","@id":"https:\/\/www.ucssolutions.com\/blog\/#website","url":"https:\/\/www.ucssolutions.com\/blog\/","name":"Unique Computer Systems","description":"Technology Blog learn about technology updates and tech innovations in Sharjah, Dubai, Abu Dhabi, UAE and the world.","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.ucssolutions.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.ucssolutions.com\/blog\/#\/schema\/person\/7386c7cca4b130fa0e54206971ddcd5b","name":"ucs_admin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.ucssolutions.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/4db329918bcfa8f657c30c839798cc77?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/4db329918bcfa8f657c30c839798cc77?s=96&d=mm&r=g","caption":"ucs_admin"}}]}},"_links":{"self":[{"href":"https:\/\/www.ucssolutions.com\/blog\/wp-json\/wp\/v2\/posts\/1294"}],"collection":[{"href":"https:\/\/www.ucssolutions.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ucssolutions.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ucssolutions.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ucssolutions.com\/blog\/wp-json\/wp\/v2\/comments?post=1294"}],"version-history":[{"count":13,"href":"https:\/\/www.ucssolutions.com\/blog\/wp-json\/wp\/v2\/posts\/1294\/revisions"}],"predecessor-version":[{"id":1314,"href":"https:\/\/www.ucssolutions.com\/blog\/wp-json\/wp\/v2\/posts\/1294\/revisions\/1314"}],"wp:attachment":[{"href":"https:\/\/www.ucssolutions.com\/blog\/wp-json\/wp\/v2\/media?parent=1294"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ucssolutions.com\/blog\/wp-json\/wp\/v2\/categories?post=1294"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ucssolutions.com\/blog\/wp-json\/wp\/v2\/tags?post=1294"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}