RAG System

ExtendedLM RAG (Retrieval-Augmented Generation) System provides advanced document search using vector embeddings, knowledge graphs, and hybrid retrieval.

What is RAG?

RAG enhances LLM responses by retrieving relevant information from your documents before generation. It combines semantic search with generative AI for accurate, context-aware answers.

Key Features

  • Vector Search: Semantic similarity using pgvector and Valkey Search
  • Knowledge Graphs: Entity/relation extraction with GraphRAG
  • Hybrid Retrieval: Combine vector search + graph + reranking
  • RAPTOR: Hierarchical chunking and summarization
  • Multi-format Support: PDF, DOCX, images, code, etc.
  • OCR: Tesseract.js for scanned documents
  • Caching: Valkey-based result caching
  • Personal & Global: Conversation-scoped and user-wide search

RAG Types

Personal RAG

Conversation-scoped documents

Global RAG

All user documents

GraphRAG

Knowledge graph retrieval

Architecture

Components

  • Document Store: Supabase Storage
  • Vector DB: PostgreSQL + pgvector
  • Global Store: Valkey Search (Redis-compatible)
  • Knowledge Graph: PostgreSQL (entities/relations)
  • Embedding Service: OpenAI/Google/Ollama
  • Ingestion Pipeline: Text extraction, chunking, embedding
  • Cache Layer: Valkey (query results)

Data Flow

  1. User uploads document
  2. Extract text (pdf-parse, OCR, markitdown)
  3. Chunk text (configurable size)
  4. Generate embeddings (1536-dim vectors)
  5. Store in vector DB (pgvector/Valkey)
  6. Extract entities/relations (optional GraphRAG)
  7. Query retrieves relevant chunks
  8. LLM generates answer with context

Directory Structure

src/server/rag/
├── ingest.ts              # Document ingestion pipeline
├── chunking.ts            # Text chunking strategies
├── embeddings.ts          # Embedding generation
├── raptor.ts              # RAPTOR hierarchical chunking
├── kg.ts                  # Knowledge graph extraction
├── graph.ts               # Graph query and traversal
├── valkeySearchStore.ts   # Valkey Search integration
├── pdfOcr.ts              # PDF OCR processing
└── rerank.ts              # Result reranking

PostgreSQL + pgvector

Overview

pgvector extension adds vector similarity search to PostgreSQL.

Features

  • Vector similarity search for semantic document retrieval
  • Efficient indexing for fast approximate nearest neighbor search
  • Multiple distance metrics for different use cases
  • User-scoped document storage with conversation context

Document Upload

Supported Formats

  • Documents: PDF, DOCX, PPTX, XLSX, TXT, MD
  • Images: JPG, PNG, GIF (vision LLM descriptions)
  • Code: JS, TS, PY, etc.
  • Data: JSON, CSV, XML

Storage

Files are stored in Supabase Storage:

  • Bucket: files
  • Path: user_id/conversation_id/filename
  • Metadata: Stored in storage_files table

Auto-Ingestion

After upload, documents are automatically ingested:

// Automatic after upload
await ingestDocument({
  userId,
  conversationId,
  fileKey: storageKey,
  fileName,
  mimeType
})

Text Extraction

PDF Extraction

Tool: pdf-parse + Tesseract.js OCR

// Extract text from PDF
const text = await pdfParse(buffer)

// If text is sparse, perform OCR
if (text.length < 100) {
  const images = await extractPdfImages(buffer)
  const ocrText = await performOcr(images)
  text += ocrText
}

Office Documents

Tool: markitdown-ts

import { markitdown } from 'markitdown-ts'

const markdown = await markitdown({
  file: buffer,
  mimeType: 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
})

Image Descriptions

For images, use vision LLM to generate descriptions:

const description = await generateText({
  model: 'openai:gpt-4o',
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'Describe this image in detail.' },
        { type: 'image', image: imageBuffer }
      ]
    }
  ]
})

OCR (Tesseract.js)

import Tesseract from 'tesseract.js'

const result = await Tesseract.recognize(
  imageBuffer,
  'eng+jpn',
  {
    logger: progress => console.log(progress)
  }
)

const text = result.data.text

Chunking Strategy

Character-based Chunking

Split text into fixed-size chunks with overlap:

function chunkText(text: string, chunkSize = 1000, overlap = 200) {
  const chunks = []
  let start = 0

  while (start < text.length) {
    const end = start + chunkSize
    const chunk = text.slice(start, end)
    chunks.push(chunk)
    start = end - overlap
  }

  return chunks
}

Sentence-aware Chunking

Split at sentence boundaries for better context:

function sentenceAwareChunk(text: string, maxSize = 1000) {
  const sentences = text.match(/[^.!?]+[.!?]+/g) || []
  const chunks = []
  let current = ''

  for (const sentence of sentences) {
    if ((current + sentence).length > maxSize) {
      chunks.push(current.trim())
      current = sentence
    } else {
      current += sentence
    }
  }

  if (current) chunks.push(current.trim())
  return chunks
}

Configuration

{
  "chunking": {
    "strategy": "sentence-aware",
    "chunkSize": 1000,
    "overlap": 200,
    "minChunkSize": 100
  }
}

Embedding Generation

Embedding Providers

  • OpenAI: text-embedding-3-small (1536 dims)
  • Google: text-embedding-004 (768 dims)
  • Ollama: mxbai-embed-large (1024 dims)

Configuration

File: global-rag-settings.json

{
  "embedding": {
    "active": "ollama",
    "providers": {
      "openai": {
        "modelName": "text-embedding-3-small",
        "dimensions": 1536
      },
      "google": {
        "modelName": "text-embedding-004",
        "dimensions": 768
      },
      "ollama": {
        "modelName": "mxbai-embed-large:latest",
        "baseUrl": "localhost",
        "port": 11434
      }
    }
  }
}

Generate Embeddings

import { embed } from 'ai'
import { openai } from '@ai-sdk/openai'

const { embedding } = await embed({
  model: openai.embedding('text-embedding-3-small'),
  value: 'This is the text to embed'
})

// embedding is a float32 array of 1536 dimensions

Batch Embedding

const embeddings = await embedMany({
  model: openai.embedding('text-embedding-3-small'),
  values: chunks // Array of text chunks
})

// embeddings.embeddings is an array of vectors

RAPTOR (Hierarchical Chunking)

Overview

RAPTOR creates a tree structure of summarizations for better retrieval:

  1. Chunk document into small segments
  2. Group chunks by similarity (clustering)
  3. Summarize each group
  4. Repeat for multiple levels
  5. Store all levels for retrieval

Tree Structure

Level 0: Original chunks (leaf nodes)
Level 1: Group summaries (10 chunks → 1 summary)
Level 2: Meta-summaries (10 L1 summaries → 1 summary)
Level 3: Document summary (root)

Implementation

async function buildRaptorTree(chunks: string[]) {
  let level = 0
  let currentChunks = chunks

  while (currentChunks.length > 1) {
    // Cluster similar chunks
    const clusters = await clusterChunks(currentChunks, clusterSize = 10)

    // Summarize each cluster
    const summaries = await Promise.all(
      clusters.map(cluster => summarizeCluster(cluster))
    )

    // Store this level
    await storeLevelChunks(summaries, level)

    currentChunks = summaries
    level++
  }
}

Retrieval Strategy

Query all levels and combine results:

  • Search Level 0 (detailed chunks)
  • Search Level 1 (group summaries)
  • Search Level 2 (meta-summaries)
  • Combine and rerank results

Configuration

{
  "raptor": {
    "enabled": true,
    "clusterSize": 10,
    "maxLevels": 3,
    "summaryModel": "openai:gpt-4o-mini"
  }
}

Knowledge Graph (GraphRAG)

Overview

GraphRAG extracts entities and relationships from documents to build a knowledge graph.

Entity Types

  • PERSON - People, characters
  • ORGANIZATION - Companies, institutions
  • LOCATION - Places, cities, countries
  • EVENT - Occurrences, incidents
  • CONCEPT - Abstract ideas, technologies
  • DATE - Temporal references
  • PRODUCT - Products, services

Relation Types

  • WORKS_FOR - Employment relationships
  • LOCATED_IN - Geographic relationships
  • PART_OF - Hierarchical relationships
  • RELATED_TO - General associations
  • CREATED - Creation relationships
  • INFLUENCES - Influence relationships

Entity Extraction

LLM-based Extraction

Use structured output to extract entities:

const schema = z.object({
  entities: z.array(z.object({
    name: z.string(),
    type: z.enum(['PERSON', 'ORGANIZATION', 'LOCATION', 'EVENT', 'CONCEPT']),
    description: z.string()
  })),
  relations: z.array(z.object({
    source: z.string(),
    target: z.string(),
    type: z.string(),
    strength: z.number().min(0).max(1)
  }))
})

const result = await generateObject({
  model: openai('gpt-4o'),
  schema,
  prompt: `Extract entities and relations from: ${text}`
})

Entity Resolution

Merge duplicate entities with fuzzy matching:

function resolveEntities(entities) {
  const resolved = new Map()

  for (const entity of entities) {
    const canonical = findCanonicalForm(entity.name, resolved)

    if (canonical) {
      // Merge with existing entity
      resolved.get(canonical).mentions++
    } else {
      resolved.set(entity.name, entity)
    }
  }

  return Array.from(resolved.values())
}

Community Detection

Overview

Group related entities into communities using graph algorithms.

Louvain Algorithm

Modularity-based community detection:

function detectCommunities(graph) {
  // Initialize: each node is its own community
  let communities = initializeCommunities(graph)
  let improved = true

  while (improved) {
    improved = false

    for (const node of graph.nodes) {
      const bestCommunity = findBestCommunity(node, communities)

      if (bestCommunity !== communities[node.id]) {
        communities[node.id] = bestCommunity
        improved = true
      }
    }
  }

  return communities
}

Community Summarization

Generate summaries for each community:

async function summarizeCommunity(communityEntities) {
  const entityDescriptions = communityEntities
    .map(e => `${e.name}: ${e.description}`)
    .join('\n')

  const summary = await generateText({
    model: 'openai:gpt-4o-mini',
    prompt: `Summarize this group of related entities:\n${entityDescriptions}`
  })

  return summary
}

Storage

-- Add community_id to entities
UPDATE kg_entities
SET community_id = $1
WHERE entity_id = ANY($2);

Graph Query

Entity Mention Boost

Boost chunks that mention graph entities:

// 1. Perform vector search
const chunks = await vectorSearch(query)

// 2. Check for entity mentions
const entities = await getQueryEntities(query)

// 3. Boost chunks with entity mentions
chunks.forEach(chunk => {
  const mentions = countEntityMentions(chunk, entities)
  chunk.score += mentions * 0.1  // Boost weight
})

Neighbor Boost

Boost chunks mentioning neighbor entities:

// 1. Find entities in query
const queryEntities = await extractEntities(query)

// 2. Get neighbors (1-hop)
const neighbors = await getNeighbors(queryEntities, hops = 1)

// 3. Boost chunks mentioning neighbors
chunks.forEach(chunk => {
  const neighborMentions = countEntityMentions(chunk, neighbors)
  chunk.score += neighborMentions * 0.15
})

Community Boost

Boost chunks from same community:

// 1. Determine query community
const queryEntities = await extractEntities(query)
const community = getMostCommonCommunity(queryEntities)

// 2. Boost chunks in same community
chunks.forEach(chunk => {
  if (chunk.community_id === community) {
    chunk.score += 0.12  // Community boost
  }
})

Configuration

{
  "kgBoost": {
    "enabled": true,
    "weights": {
      "mention": 0.1,
      "neighbor": 0.15,
      "community": 0.12
    }
  },
  "edgeBoost": {
    "enabled": true,
    "weight": 0.1
  }
}

Caching Strategy

Global RAG Cache

Backend: Valkey (Redis-compatible)

TTL: 600 seconds (configurable)

// Cache key format
const cacheKey = `global-rag-cache:${hashQuery(query)}`

// Check cache
const cached = await valkey.get(cacheKey)
if (cached) {
  return JSON.parse(cached)
}

// Perform search
const results = await performRAGSearch(query)

// Store in cache
await valkey.setEx(cacheKey, 600, JSON.stringify(results))

Personal RAG Cache

TTL: 300 seconds

Scope: Per-user, per-conversation

const cacheKey = `personal-rag:${userId}:${conversationId}:${hash}`
await valkey.setEx(cacheKey, 300, JSON.stringify(results))

Configuration

# .env.local
GLOBAL_RAG_CACHE_ENABLED=true
GLOBAL_RAG_CACHE_TTL_SECONDS=600
PERSONAL_RAG_CACHE_TTL_SECONDS=300

Cache Invalidation

Clear cache when documents are added/deleted:

// On document upload
await valkey.del(`personal-rag:${userId}:${conversationId}:*`)

// On document deletion
await valkey.del(`global-rag-cache:*`)

Reranking

Overview

Rerank search results using a specialized model for better relevance.

Ollama Reranking

Model: bge-m3 (or similar cross-encoder)

async function rerank(query: string, chunks: Chunk[], topN: number) {
  const scores = await Promise.all(
    chunks.map(async chunk => {
      const score = await ollama.rerank({
        model: 'bge-m3',
        query,
        document: chunk.content
      })
      return { chunk, score }
    })
  )

  return scores
    .sort((a, b) => b.score - a.score)
    .slice(0, topN)
    .map(s => s.chunk)
}

Configuration

{
  "rerank": {
    "enabled": true,
    "modelName": "bge-m3",
    "topN": 8,
    "provider": "ollama"
  }
}

When to Rerank

  • After vector search (before LLM)
  • After hybrid search (vector + graph)
  • When precision is critical

RAG Configuration

File: global-rag-settings.json

Full Configuration

{
  "ragModelKey": "openai:gpt-4o",
  "embedding": {
    "active": "ollama",
    "providers": {
      "openai": {
        "modelName": "text-embedding-3-small",
        "dimensions": 1536
      },
      "google": {
        "modelName": "text-embedding-004",
        "dimensions": 768
      },
      "ollama": {
        "modelName": "mxbai-embed-large:latest",
        "baseUrl": "localhost",
        "port": 11434
      }
    }
  },
  "chunking": {
    "chunkSize": 1000,
    "overlap": 200
  },
  "retrieval": {
    "topK": 10,
    "similarityThreshold": 0.7
  },
  "rerank": {
    "enabled": false,
    "modelName": "bge-m3",
    "topN": 8
  },
  "kgBoost": {
    "enabled": false,
    "weights": {
      "mention": 0.1,
      "neighbor": 0.15,
      "community": 0.12
    }
  },
  "edgeBoost": {
    "enabled": true,
    "weight": 0.1
  },
  "raptor": {
    "enabled": false,
    "clusterSize": 10,
    "maxLevels": 3
  }
}

RAG API Reference

Ingest Document

Endpoint: POST /api/rag/ingest/file

curl -X POST http://localhost:3000/api/rag/ingest/file \
  -F "file=@document.pdf" \
  -F "conversationId=conv_123"

Ingest from URL

Endpoint: POST /api/rag/ingest/url

{
  "url": "https://example.com/article",
  "conversationId": "conv_123"
}

Ingest Text

Endpoint: POST /api/rag/ingest/text

{
  "text": "This is the content to ingest...",
  "title": "My Document",
  "conversationId": "conv_123"
}

List Documents

Endpoint: GET /api/rag/documents

{
  "documents": [
    {
      "id": "doc_123",
      "file_name": "document.pdf",
      "chunks": 45,
      "created_at": "2025-01-01T00:00:00Z"
    }
  ]
}

Delete Document

Endpoint: DELETE /api/rag/documents/[id]

Build Knowledge Graph

Endpoint: POST /api/rag/graph/build

{
  "documentIds": ["doc_123", "doc_456"],
  "extractionModel": "openai:gpt-4o"
}

Graph Overview

Endpoint: GET /api/rag/graph/overview

{
  "entities": 142,
  "relations": 238,
  "communities": 12
}

Update Settings

Endpoint: POST /api/rag/settings

{
  "embedding": {
    "active": "ollama"
  },
  "rerank": {
    "enabled": true
  }
}