AI Infrastructure

Private RAG Knowledge System

On-premise retrieval-augmented generation system for a professional services firm handling sensitive client documents.

OllamaLangChainPostgreSQL + pgvectorDockerReact

100% data privacy — zero cloud exposure

Sub-2s average query response time

10,000+ documents indexed

Deployed entirely on-premise

The Challenge

A legal-adjacent services firm needed internal AI search over 10,000+ client documents but couldn't use cloud AI tools due to data sensitivity and compliance constraints.

The Approach

Assessed infrastructure constraints. Designed a fully air-gapped architecture using local models and vector storage. Chose Ollama for model serving and pgvector for embeddings storage.

The Solution

Deployed a private RAG stack: Ollama running Mistral 7B, LangChain for document chunking and retrieval, PostgreSQL with pgvector for embeddings, and a clean React UI for staff queries. All running on their existing server infrastructure.

Architecture diagram — add diagram here