RAG System
ExtendedLM RAG (Retrieval-Augmented Generation) System provides advanced document search using vector embeddings, knowledge graphs, and hybrid retrieval.
RAG enhances LLM responses by retrieving relevant information from your documents before generation. It combines semantic search with generative AI for accurate, context-aware answers.
Key Features
- Vector Search: Semantic similarity using pgvector and Valkey Search
- Knowledge Graphs: Entity/relation extraction with GraphRAG
- Hybrid Retrieval: Combine vector search + graph + reranking
- RAPTOR: Hierarchical chunking and summarization
- Multi-format Support: PDF, DOCX, images, code, etc.
- OCR: Tesseract.js for scanned documents
- Caching: Valkey-based result caching
- Personal & Global: Conversation-scoped and user-wide search
RAG Types
Personal RAG
Conversation-scoped documents
Global RAG
All user documents
GraphRAG
Knowledge graph retrieval
Architecture
Components
- Document Store: Supabase Storage
- Vector DB: PostgreSQL + pgvector
- Global Store: Valkey Search (Redis-compatible)
- Knowledge Graph: PostgreSQL (entities/relations)
- Embedding Service: OpenAI/Google/Ollama
- Ingestion Pipeline: Text extraction, chunking, embedding
- Cache Layer: Valkey (query results)
Data Flow
- User uploads document
- Extract text (pdf-parse, OCR, markitdown)
- Chunk text (configurable size)
- Generate embeddings (1536-dim vectors)
- Store in vector DB (pgvector/Valkey)
- Extract entities/relations (optional GraphRAG)
- Query retrieves relevant chunks
- LLM generates answer with context
Directory Structure
src/server/rag/
├── ingest.ts # Document ingestion pipeline
├── chunking.ts # Text chunking strategies
├── embeddings.ts # Embedding generation
├── raptor.ts # RAPTOR hierarchical chunking
├── kg.ts # Knowledge graph extraction
├── graph.ts # Graph query and traversal
├── valkeySearchStore.ts # Valkey Search integration
├── pdfOcr.ts # PDF OCR processing
└── rerank.ts # Result reranking
PostgreSQL + pgvector
Overview
pgvector extension adds vector similarity search to PostgreSQL.
Features
- Vector similarity search for semantic document retrieval
- Efficient indexing for fast approximate nearest neighbor search
- Multiple distance metrics for different use cases
- User-scoped document storage with conversation context
Valkey Search (Redis-compatible)
Overview
Valkey Search provides distributed vector search with HNSW indexing.
Features
- Distributed vector search with advanced indexing
- Hybrid search combining vector and full-text search
- High-performance nearest neighbor search
- Redis-compatible interface for easy integration
Document Upload
Supported Formats
- Documents: PDF, DOCX, PPTX, XLSX, TXT, MD
- Images: JPG, PNG, GIF (vision LLM descriptions)
- Code: JS, TS, PY, etc.
- Data: JSON, CSV, XML
Storage
Files are stored in Supabase Storage:
- Bucket:
files - Path:
user_id/conversation_id/filename - Metadata: Stored in
storage_filestable
Auto-Ingestion
After upload, documents are automatically ingested:
// Automatic after upload
await ingestDocument({
userId,
conversationId,
fileKey: storageKey,
fileName,
mimeType
})
Text Extraction
PDF Extraction
Tool: pdf-parse + Tesseract.js OCR
// Extract text from PDF
const text = await pdfParse(buffer)
// If text is sparse, perform OCR
if (text.length < 100) {
const images = await extractPdfImages(buffer)
const ocrText = await performOcr(images)
text += ocrText
}
Office Documents
Tool: markitdown-ts
import { markitdown } from 'markitdown-ts'
const markdown = await markitdown({
file: buffer,
mimeType: 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
})
Image Descriptions
For images, use vision LLM to generate descriptions:
const description = await generateText({
model: 'openai:gpt-4o',
messages: [
{
role: 'user',
content: [
{ type: 'text', text: 'Describe this image in detail.' },
{ type: 'image', image: imageBuffer }
]
}
]
})
OCR (Tesseract.js)
import Tesseract from 'tesseract.js'
const result = await Tesseract.recognize(
imageBuffer,
'eng+jpn',
{
logger: progress => console.log(progress)
}
)
const text = result.data.text
Chunking Strategy
Character-based Chunking
Split text into fixed-size chunks with overlap:
function chunkText(text: string, chunkSize = 1000, overlap = 200) {
const chunks = []
let start = 0
while (start < text.length) {
const end = start + chunkSize
const chunk = text.slice(start, end)
chunks.push(chunk)
start = end - overlap
}
return chunks
}
Sentence-aware Chunking
Split at sentence boundaries for better context:
function sentenceAwareChunk(text: string, maxSize = 1000) {
const sentences = text.match(/[^.!?]+[.!?]+/g) || []
const chunks = []
let current = ''
for (const sentence of sentences) {
if ((current + sentence).length > maxSize) {
chunks.push(current.trim())
current = sentence
} else {
current += sentence
}
}
if (current) chunks.push(current.trim())
return chunks
}
Configuration
{
"chunking": {
"strategy": "sentence-aware",
"chunkSize": 1000,
"overlap": 200,
"minChunkSize": 100
}
}
Embedding Generation
Embedding Providers
- OpenAI: text-embedding-3-small (1536 dims)
- Google: text-embedding-004 (768 dims)
- Ollama: mxbai-embed-large (1024 dims)
Configuration
File: global-rag-settings.json
{
"embedding": {
"active": "ollama",
"providers": {
"openai": {
"modelName": "text-embedding-3-small",
"dimensions": 1536
},
"google": {
"modelName": "text-embedding-004",
"dimensions": 768
},
"ollama": {
"modelName": "mxbai-embed-large:latest",
"baseUrl": "localhost",
"port": 11434
}
}
}
}
Generate Embeddings
import { embed } from 'ai'
import { openai } from '@ai-sdk/openai'
const { embedding } = await embed({
model: openai.embedding('text-embedding-3-small'),
value: 'This is the text to embed'
})
// embedding is a float32 array of 1536 dimensions
Batch Embedding
const embeddings = await embedMany({
model: openai.embedding('text-embedding-3-small'),
values: chunks // Array of text chunks
})
// embeddings.embeddings is an array of vectors
RAPTOR (Hierarchical Chunking)
Overview
RAPTOR creates a tree structure of summarizations for better retrieval:
- Chunk document into small segments
- Group chunks by similarity (clustering)
- Summarize each group
- Repeat for multiple levels
- Store all levels for retrieval
Tree Structure
Level 0: Original chunks (leaf nodes)
Level 1: Group summaries (10 chunks → 1 summary)
Level 2: Meta-summaries (10 L1 summaries → 1 summary)
Level 3: Document summary (root)
Implementation
async function buildRaptorTree(chunks: string[]) {
let level = 0
let currentChunks = chunks
while (currentChunks.length > 1) {
// Cluster similar chunks
const clusters = await clusterChunks(currentChunks, clusterSize = 10)
// Summarize each cluster
const summaries = await Promise.all(
clusters.map(cluster => summarizeCluster(cluster))
)
// Store this level
await storeLevelChunks(summaries, level)
currentChunks = summaries
level++
}
}
Retrieval Strategy
Query all levels and combine results:
- Search Level 0 (detailed chunks)
- Search Level 1 (group summaries)
- Search Level 2 (meta-summaries)
- Combine and rerank results
Configuration
{
"raptor": {
"enabled": true,
"clusterSize": 10,
"maxLevels": 3,
"summaryModel": "openai:gpt-4o-mini"
}
}
Knowledge Graph (GraphRAG)
Overview
GraphRAG extracts entities and relationships from documents to build a knowledge graph.
Entity Types
- PERSON - People, characters
- ORGANIZATION - Companies, institutions
- LOCATION - Places, cities, countries
- EVENT - Occurrences, incidents
- CONCEPT - Abstract ideas, technologies
- DATE - Temporal references
- PRODUCT - Products, services
Relation Types
- WORKS_FOR - Employment relationships
- LOCATED_IN - Geographic relationships
- PART_OF - Hierarchical relationships
- RELATED_TO - General associations
- CREATED - Creation relationships
- INFLUENCES - Influence relationships
Entity Extraction
LLM-based Extraction
Use structured output to extract entities:
const schema = z.object({
entities: z.array(z.object({
name: z.string(),
type: z.enum(['PERSON', 'ORGANIZATION', 'LOCATION', 'EVENT', 'CONCEPT']),
description: z.string()
})),
relations: z.array(z.object({
source: z.string(),
target: z.string(),
type: z.string(),
strength: z.number().min(0).max(1)
}))
})
const result = await generateObject({
model: openai('gpt-4o'),
schema,
prompt: `Extract entities and relations from: ${text}`
})
Entity Resolution
Merge duplicate entities with fuzzy matching:
function resolveEntities(entities) {
const resolved = new Map()
for (const entity of entities) {
const canonical = findCanonicalForm(entity.name, resolved)
if (canonical) {
// Merge with existing entity
resolved.get(canonical).mentions++
} else {
resolved.set(entity.name, entity)
}
}
return Array.from(resolved.values())
}
Community Detection
Overview
Group related entities into communities using graph algorithms.
Louvain Algorithm
Modularity-based community detection:
function detectCommunities(graph) {
// Initialize: each node is its own community
let communities = initializeCommunities(graph)
let improved = true
while (improved) {
improved = false
for (const node of graph.nodes) {
const bestCommunity = findBestCommunity(node, communities)
if (bestCommunity !== communities[node.id]) {
communities[node.id] = bestCommunity
improved = true
}
}
}
return communities
}
Community Summarization
Generate summaries for each community:
async function summarizeCommunity(communityEntities) {
const entityDescriptions = communityEntities
.map(e => `${e.name}: ${e.description}`)
.join('\n')
const summary = await generateText({
model: 'openai:gpt-4o-mini',
prompt: `Summarize this group of related entities:\n${entityDescriptions}`
})
return summary
}
Storage
-- Add community_id to entities
UPDATE kg_entities
SET community_id = $1
WHERE entity_id = ANY($2);
Graph Query
Entity Mention Boost
Boost chunks that mention graph entities:
// 1. Perform vector search
const chunks = await vectorSearch(query)
// 2. Check for entity mentions
const entities = await getQueryEntities(query)
// 3. Boost chunks with entity mentions
chunks.forEach(chunk => {
const mentions = countEntityMentions(chunk, entities)
chunk.score += mentions * 0.1 // Boost weight
})
Neighbor Boost
Boost chunks mentioning neighbor entities:
// 1. Find entities in query
const queryEntities = await extractEntities(query)
// 2. Get neighbors (1-hop)
const neighbors = await getNeighbors(queryEntities, hops = 1)
// 3. Boost chunks mentioning neighbors
chunks.forEach(chunk => {
const neighborMentions = countEntityMentions(chunk, neighbors)
chunk.score += neighborMentions * 0.15
})
Community Boost
Boost chunks from same community:
// 1. Determine query community
const queryEntities = await extractEntities(query)
const community = getMostCommonCommunity(queryEntities)
// 2. Boost chunks in same community
chunks.forEach(chunk => {
if (chunk.community_id === community) {
chunk.score += 0.12 // Community boost
}
})
Configuration
{
"kgBoost": {
"enabled": true,
"weights": {
"mention": 0.1,
"neighbor": 0.15,
"community": 0.12
}
},
"edgeBoost": {
"enabled": true,
"weight": 0.1
}
}
Caching Strategy
Global RAG Cache
Backend: Valkey (Redis-compatible)
TTL: 600 seconds (configurable)
// Cache key format
const cacheKey = `global-rag-cache:${hashQuery(query)}`
// Check cache
const cached = await valkey.get(cacheKey)
if (cached) {
return JSON.parse(cached)
}
// Perform search
const results = await performRAGSearch(query)
// Store in cache
await valkey.setEx(cacheKey, 600, JSON.stringify(results))
Personal RAG Cache
TTL: 300 seconds
Scope: Per-user, per-conversation
const cacheKey = `personal-rag:${userId}:${conversationId}:${hash}`
await valkey.setEx(cacheKey, 300, JSON.stringify(results))
Configuration
# .env.local
GLOBAL_RAG_CACHE_ENABLED=true
GLOBAL_RAG_CACHE_TTL_SECONDS=600
PERSONAL_RAG_CACHE_TTL_SECONDS=300
Cache Invalidation
Clear cache when documents are added/deleted:
// On document upload
await valkey.del(`personal-rag:${userId}:${conversationId}:*`)
// On document deletion
await valkey.del(`global-rag-cache:*`)
Reranking
Overview
Rerank search results using a specialized model for better relevance.
Ollama Reranking
Model: bge-m3 (or similar cross-encoder)
async function rerank(query: string, chunks: Chunk[], topN: number) {
const scores = await Promise.all(
chunks.map(async chunk => {
const score = await ollama.rerank({
model: 'bge-m3',
query,
document: chunk.content
})
return { chunk, score }
})
)
return scores
.sort((a, b) => b.score - a.score)
.slice(0, topN)
.map(s => s.chunk)
}
Configuration
{
"rerank": {
"enabled": true,
"modelName": "bge-m3",
"topN": 8,
"provider": "ollama"
}
}
When to Rerank
- After vector search (before LLM)
- After hybrid search (vector + graph)
- When precision is critical
RAG Configuration
File: global-rag-settings.json
Full Configuration
{
"ragModelKey": "openai:gpt-4o",
"embedding": {
"active": "ollama",
"providers": {
"openai": {
"modelName": "text-embedding-3-small",
"dimensions": 1536
},
"google": {
"modelName": "text-embedding-004",
"dimensions": 768
},
"ollama": {
"modelName": "mxbai-embed-large:latest",
"baseUrl": "localhost",
"port": 11434
}
}
},
"chunking": {
"chunkSize": 1000,
"overlap": 200
},
"retrieval": {
"topK": 10,
"similarityThreshold": 0.7
},
"rerank": {
"enabled": false,
"modelName": "bge-m3",
"topN": 8
},
"kgBoost": {
"enabled": false,
"weights": {
"mention": 0.1,
"neighbor": 0.15,
"community": 0.12
}
},
"edgeBoost": {
"enabled": true,
"weight": 0.1
},
"raptor": {
"enabled": false,
"clusterSize": 10,
"maxLevels": 3
}
}
RAG API Reference
Ingest Document
Endpoint: POST /api/rag/ingest/file
curl -X POST http://localhost:3000/api/rag/ingest/file \
-F "file=@document.pdf" \
-F "conversationId=conv_123"
Ingest from URL
Endpoint: POST /api/rag/ingest/url
{
"url": "https://example.com/article",
"conversationId": "conv_123"
}
Ingest Text
Endpoint: POST /api/rag/ingest/text
{
"text": "This is the content to ingest...",
"title": "My Document",
"conversationId": "conv_123"
}
List Documents
Endpoint: GET /api/rag/documents
{
"documents": [
{
"id": "doc_123",
"file_name": "document.pdf",
"chunks": 45,
"created_at": "2025-01-01T00:00:00Z"
}
]
}
Delete Document
Endpoint: DELETE /api/rag/documents/[id]
Build Knowledge Graph
Endpoint: POST /api/rag/graph/build
{
"documentIds": ["doc_123", "doc_456"],
"extractionModel": "openai:gpt-4o"
}
Graph Overview
Endpoint: GET /api/rag/graph/overview
{
"entities": 142,
"relations": 238,
"communities": 12
}
Update Settings
Endpoint: POST /api/rag/settings
{
"embedding": {
"active": "ollama"
},
"rerank": {
"enabled": true
}
}