RAG System

ExtendedLM RAG (Retrieval-Augmented Generation) System provides advanced document search using vector embeddings, knowledge graphs, and hybrid retrieval.

What is RAG?

RAG enhances LLM responses by retrieving relevant information from your documents before generation. It combines semantic search with generative AI for accurate, context-aware answers.

Key Features

Vector Search: Semantic similarity using pgvector and Valkey Search
Knowledge Graphs: Entity/relation extraction with GraphRAG
Hybrid Retrieval: Combine vector search + graph + reranking
RAPTOR: Hierarchical chunking and summarization
Multi-format Support: PDF, DOCX, images, code, etc.
OCR: Tesseract.js for scanned documents
Caching: Valkey-based result caching
Personal & Global: Conversation-scoped and user-wide search

RAG Types

Personal RAG

Conversation-scoped documents

Global RAG

All user documents

GraphRAG

Knowledge graph retrieval

Architecture

Components

Document Store: Supabase Storage
Vector DB: PostgreSQL + pgvector
Global Store: Valkey Search (Redis-compatible)
Knowledge Graph: PostgreSQL (entities/relations)
Embedding Service: OpenAI/Google/Gateway
Ingestion Pipeline: Text extraction, chunking, embedding
Cache Layer: Valkey (query results)

Data Flow

User uploads document
Extract text (pdf-parse, OCR, markitdown)
Chunk text (configurable size)
Generate embeddings (1536-dim vectors)
Store in vector DB (pgvector/Valkey)
Extract entities/relations (optional GraphRAG)
Query retrieves relevant chunks
LLM generates answer with context

Directory Structure

src/server/rag/
├── ingest.ts              # Document ingestion pipeline
├── chunking.ts            # Text chunking strategies
├── embeddings.ts          # Embedding generation
├── raptor.ts              # RAPTOR hierarchical chunking
├── kg.ts                  # Knowledge graph extraction
├── graph.ts               # Graph query and traversal
├── valkeySearchStore.ts   # Valkey Search integration
├── pdfOcr.ts              # PDF OCR processing
└── rerank.ts              # Result reranking

PostgreSQL + pgvector

Overview

pgvector extension adds vector similarity search to PostgreSQL.

Features

Vector similarity search for semantic document retrieval
Efficient indexing for fast approximate nearest neighbor search
Multiple distance metrics for different use cases
User-scoped document storage with conversation context

Valkey Search (Redis-compatible)

Overview

Valkey Search provides distributed vector search with HNSW indexing.

Features

Distributed vector search with advanced indexing
Hybrid search combining vector and full-text search
High-performance nearest neighbor search
Redis-compatible interface for easy integration

Document Upload

Supported Formats

Documents: PDF, DOCX, PPTX, XLSX, TXT, MD
Images: JPG, PNG, GIF (vision LLM descriptions)
Code: JS, TS, PY, etc.
Data: JSON, CSV, XML

Storage

Files are stored in Supabase Storage:

Bucket: files
Path: user_id/conversation_id/filename
Metadata: Stored in storage_files table

Auto-Ingestion

After upload, documents are automatically ingested:

// Automatic after upload
await ingestDocument({
  userId,
  conversationId,
  fileKey: storageKey,
  fileName,
  mimeType
})

Text Extraction

PDF Extraction

Tool: pdf-parse + Tesseract.js OCR

// Extract text from PDF
const text = await pdfParse(buffer)

// If text is sparse, perform OCR
if (text.length < 100) {
  const images = await extractPdfImages(buffer)
  const ocrText = await performOcr(images)
  text += ocrText
}

Office Documents

Tool: markitdown-ts

import { markitdown } from 'markitdown-ts'

const markdown = await markitdown({
  file: buffer,
  mimeType: 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
})

Image Descriptions

For images, use vision LLM to generate descriptions:

const description = await generateText({
  model: 'openai:gpt-4o',
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'Describe this image in detail.' },
        { type: 'image', image: imageBuffer }
      ]
    }
  ]
})

OCR (Tesseract.js)

import Tesseract from 'tesseract.js'

const result = await Tesseract.recognize(
  imageBuffer,
  'eng+jpn',
  {
    logger: progress => console.log(progress)
  }
)

const text = result.data.text

Chunking Strategy

Character-based Chunking

Split text into fixed-size chunks with overlap:

function chunkText(text: string, chunkSize = 1000, overlap = 200) {
  const chunks = []
  let start = 0

  while (start < text.length) {
    const end = start + chunkSize
    const chunk = text.slice(start, end)
    chunks.push(chunk)
    start = end - overlap
  }

  return chunks
}

Sentence-aware Chunking

Split at sentence boundaries for better context:

function sentenceAwareChunk(text: string, maxSize = 1000) {
  const sentences = text.match(/[^.!?]+[.!?]+/g) || []
  const chunks = []
  let current = ''

  for (const sentence of sentences) {
    if ((current + sentence).length > maxSize) {
      chunks.push(current.trim())
      current = sentence
    } else {
      current += sentence
    }
  }

  if (current) chunks.push(current.trim())
  return chunks
}

Configuration

{
  "chunking": {
    "strategy": "sentence-aware",
    "chunkSize": 1000,
    "overlap": 200,
    "minChunkSize": 100
  }
}

Embedding Generation

Embedding Providers

OpenAI: text-embedding-3-small (1536 dims)
Google: text-embedding-004 (768 dims)
Gateway: Gateway経由でローカルモデルの埋め込みも利用可能

Configuration

File: global-rag-settings.json

{
  "embedding": {
    "active": "openai",
    "providers": {
      "openai": {
        "modelName": "text-embedding-3-small",
        "dimensions": 1536
      },
      "google": {
        "modelName": "text-embedding-004",
        "dimensions": 768
      },
      "gateway": {
        "modelName": "text-embedding-3-small",
        "baseUrl": "localhost",
        "port": 8080
      }
    }
  }
}

Generate Embeddings

import { embed } from 'ai'
import { openai } from '@ai-sdk/openai'

const { embedding } = await embed({
  model: openai.embedding('text-embedding-3-small'),
  value: 'This is the text to embed'
})

// embedding is a float32 array of 1536 dimensions

Batch Embedding

const embeddings = await embedMany({
  model: openai.embedding('text-embedding-3-small'),
  values: chunks // Array of text chunks
})

// embeddings.embeddings is an array of vectors

RAPTOR (Hierarchical Chunking)

Overview

RAPTOR creates a tree structure of summarizations for better retrieval:

Chunk document into small segments
Group chunks by similarity (clustering)
Summarize each group
Repeat for multiple levels
Store all levels for retrieval

Tree Structure

Level 0: Original chunks (leaf nodes)
Level 1: Group summaries (10 chunks → 1 summary)
Level 2: Meta-summaries (10 L1 summaries → 1 summary)
Level 3: Document summary (root)

Implementation

async function buildRaptorTree(chunks: string[]) {
  let level = 0
  let currentChunks = chunks

  while (currentChunks.length > 1) {
    // Cluster similar chunks
    const clusters = await clusterChunks(currentChunks, clusterSize = 10)

    // Summarize each cluster
    const summaries = await Promise.all(
      clusters.map(cluster => summarizeCluster(cluster))
    )

    // Store this level
    await storeLevelChunks(summaries, level)

    currentChunks = summaries
    level++
  }
}

Retrieval Strategy

Query all levels and combine results:

Search Level 0 (detailed chunks)
Search Level 1 (group summaries)
Search Level 2 (meta-summaries)
Combine and rerank results

Configuration

{
  "raptor": {
    "enabled": true,
    "clusterSize": 10,
    "maxLevels": 3,
    "summaryModel": "openai:gpt-4o-mini"
  }
}

Knowledge Graph (GraphRAG)

Overview

GraphRAG extracts entities and relationships from documents to build a knowledge graph.

Entity Types

PERSON - People, characters
ORGANIZATION - Companies, institutions
LOCATION - Places, cities, countries
EVENT - Occurrences, incidents
CONCEPT - Abstract ideas, technologies
DATE - Temporal references
PRODUCT - Products, services

Relation Types

WORKS_FOR - Employment relationships
LOCATED_IN - Geographic relationships
PART_OF - Hierarchical relationships
RELATED_TO - General associations
CREATED - Creation relationships
INFLUENCES - Influence relationships

Entity Extraction

LLM-based Extraction

Use structured output to extract entities:

const schema = z.object({
  entities: z.array(z.object({
    name: z.string(),
    type: z.enum(['PERSON', 'ORGANIZATION', 'LOCATION', 'EVENT', 'CONCEPT']),
    description: z.string()
  })),
  relations: z.array(z.object({
    source: z.string(),
    target: z.string(),
    type: z.string(),
    strength: z.number().min(0).max(1)
  }))
})

const result = await generateObject({
  model: openai('gpt-4o'),
  schema,
  prompt: `Extract entities and relations from: ${text}`
})

Entity Resolution

Merge duplicate entities with fuzzy matching:

function resolveEntities(entities) {
  const resolved = new Map()

  for (const entity of entities) {
    const canonical = findCanonicalForm(entity.name, resolved)

    if (canonical) {
      // Merge with existing entity
      resolved.get(canonical).mentions++
    } else {
      resolved.set(entity.name, entity)
    }
  }

  return Array.from(resolved.values())
}

Community Detection

Overview

Group related entities into communities using graph algorithms.

Louvain Algorithm

Modularity-based community detection:

function detectCommunities(graph) {
  // Initialize: each node is its own community
  let communities = initializeCommunities(graph)
  let improved = true

  while (improved) {
    improved = false

    for (const node of graph.nodes) {
      const bestCommunity = findBestCommunity(node, communities)

      if (bestCommunity !== communities[node.id]) {
        communities[node.id] = bestCommunity
        improved = true
      }
    }
  }

  return communities
}

Community Summarization

Generate summaries for each community:

async function summarizeCommunity(communityEntities) {
  const entityDescriptions = communityEntities
    .map(e => `${e.name}: ${e.description}`)
    .join('\n')

  const summary = await generateText({
    model: 'openai:gpt-4o-mini',
    prompt: `Summarize this group of related entities:\n${entityDescriptions}`
  })

  return summary
}

Storage

-- Add community_id to entities
UPDATE kg_entities
SET community_id = $1
WHERE entity_id = ANY($2);

Graph Query

Entity Mention Boost

Boost chunks that mention graph entities:

// 1. Perform vector search
const chunks = await vectorSearch(query)

// 2. Check for entity mentions
const entities = await getQueryEntities(query)

// 3. Boost chunks with entity mentions
chunks.forEach(chunk => {
  const mentions = countEntityMentions(chunk, entities)
  chunk.score += mentions * 0.1  // Boost weight
})

Neighbor Boost

Boost chunks mentioning neighbor entities:

// 1. Find entities in query
const queryEntities = await extractEntities(query)

// 2. Get neighbors (1-hop)
const neighbors = await getNeighbors(queryEntities, hops = 1)

// 3. Boost chunks mentioning neighbors
chunks.forEach(chunk => {
  const neighborMentions = countEntityMentions(chunk, neighbors)
  chunk.score += neighborMentions * 0.15
})

Community Boost

Boost chunks from same community:

// 1. Determine query community
const queryEntities = await extractEntities(query)
const community = getMostCommonCommunity(queryEntities)

// 2. Boost chunks in same community
chunks.forEach(chunk => {
  if (chunk.community_id === community) {
    chunk.score += 0.12  // Community boost
  }
})

Configuration

{
  "kgBoost": {
    "enabled": true,
    "weights": {
      "mention": 0.1,
      "neighbor": 0.15,
      "community": 0.12
    }
  },
  "edgeBoost": {
    "enabled": true,
    "weight": 0.1
  }
}

Caching Strategy

Global RAG Cache

Backend: Valkey (Redis-compatible)

TTL: 600 seconds (configurable)

// Cache key format
const cacheKey = `global-rag-cache:${hashQuery(query)}`

// Check cache
const cached = await valkey.get(cacheKey)
if (cached) {
  return JSON.parse(cached)
}

// Perform search
const results = await performRAGSearch(query)

// Store in cache
await valkey.setEx(cacheKey, 600, JSON.stringify(results))

Personal RAG Cache

TTL: 300 seconds

Scope: Per-user, per-conversation

const cacheKey = `personal-rag:${userId}:${conversationId}:${hash}`
await valkey.setEx(cacheKey, 300, JSON.stringify(results))

Configuration

# .env.local
GLOBAL_RAG_CACHE_ENABLED=true
GLOBAL_RAG_CACHE_TTL_SECONDS=600
PERSONAL_RAG_CACHE_TTL_SECONDS=300

Cache Invalidation

Clear cache when documents are added/deleted:

// On document upload
await valkey.del(`personal-rag:${userId}:${conversationId}:*`)

// On document deletion
await valkey.del(`global-rag-cache:*`)

Reranking

Overview

Rerank search results using a specialized model for better relevance.

Gateway Reranking

Provider: Gateway (rerank endpoint)

async function rerank(query: string, chunks: Chunk[], topN: number) {
  const response = await fetch(`${gatewayUrl}/v1/rerank`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      model: modelName,
      query,
      documents: chunks.map(c => c.content),
      top_n: topN,
    }),
  })
  const data = await response.json()
  return data.results
    .sort((a, b) => b.relevance_score - a.relevance_score)
    .map(r => chunks[r.index])
}

Configuration

{
  "rerank": {
    "enabled": true,
    "modelName": "rerank-model",
    "topN": 8,
    "provider": "gateway"
  }
}

When to Rerank

After vector search (before LLM)
After hybrid search (vector + graph)
When precision is critical

RAG Configuration

File: global-rag-settings.json

Full Configuration

{
  "ragModelKey": "openai:gpt-4o",
  "embedding": {
    "active": "openai",
    "providers": {
      "openai": {
        "modelName": "text-embedding-3-small",
        "dimensions": 1536
      },
      "google": {
        "modelName": "text-embedding-004",
        "dimensions": 768
      },
      "gateway": {
        "modelName": "text-embedding-3-small",
        "baseUrl": "localhost",
        "port": 8080
      }
    }
  },
  "chunking": {
    "chunkSize": 1000,
    "overlap": 200
  },
  "retrieval": {
    "topK": 10,
    "similarityThreshold": 0.7
  },
  "rerank": {
    "enabled": false,
    "modelName": "bge-m3",
    "topN": 8
  },
  "kgBoost": {
    "enabled": false,
    "weights": {
      "mention": 0.1,
      "neighbor": 0.15,
      "community": 0.12
    }
  },
  "edgeBoost": {
    "enabled": true,
    "weight": 0.1
  },
  "raptor": {
    "enabled": false,
    "clusterSize": 10,
    "maxLevels": 3
  }
}

RAG API Reference

Ingest Document

Endpoint: POST /api/rag/ingest/file

curl -X POST http://localhost:3000/api/rag/ingest/file \
  -F "file=@document.pdf" \
  -F "conversationId=conv_123"

Ingest from URL

Endpoint: POST /api/rag/ingest/url

{
  "url": "https://example.com/article",
  "conversationId": "conv_123"
}

Ingest Text

Endpoint: POST /api/rag/ingest/text

{
  "text": "This is the content to ingest...",
  "title": "My Document",
  "conversationId": "conv_123"
}

List Documents

Endpoint: GET /api/rag/documents

{
  "documents": [
    {
      "id": "doc_123",
      "file_name": "document.pdf",
      "chunks": 45,
      "created_at": "2025-01-01T00:00:00Z"
    }
  ]
}

Delete Document

Endpoint: DELETE /api/rag/documents/[id]

Build Knowledge Graph

Endpoint: POST /api/rag/graph/build

{
  "documentIds": ["doc_123", "doc_456"],
  "extractionModel": "openai:gpt-4o"
}

Graph Overview

Endpoint: GET /api/rag/graph/overview

{
  "entities": 142,
  "relations": 238,
  "communities": 12
}

Update Settings

Endpoint: POST /api/rag/settings

{
  "embedding": {
    "active": "gateway"
  },
  "rerank": {
    "enabled": true
  }
}