Agent System

ExtendedLM includes 12 specialized agents, each designed for specific tasks with unique capabilities and tools.

What are Agents?

Agents are specialized AI assistants built on top of LLMs with access to specific tools and capabilities. Each agent is optimized for particular tasks, from document search to browser automation.

Available Agents

Standard Agent

General Q&A

RAG Agent

Document search

Global RAG

Cross-user search

Translation

EN/JP translation

Weather

Weather info

Computer Use

System automation

MCP Agent

Dynamic tools

PDF Reading

PDF analysis

Report

Report generation

Slide Assistant

Presentations

Browser

Web automation

Image Creator

Image generation

Agent Architecture

Base Agent Class

All agents extend the base BaseAgent class:

abstract class BaseAgent {
  abstract execute(
    messages: Message[],
    runtimeContext: RuntimeContext,
    tools?: Tool[],
    memory?: Memory
  ): Promise<AgentResponse>
}

Runtime Context

Agents receive a runtime context with:

  • modelKey: Selected LLM model
  • userId: Current user ID
  • conversationId: Active conversation
  • abortSignal: Cancellation signal
  • metadata: Additional context

Agent Tools

Agents use tools to perform specific actions:

  • vectorQueryTool: RAG document search
  • weatherTool: Weather API queries
  • playwrightTools: Browser automation
  • generateHtmlPresentationTool: Slide creation

Agent Communication Protocol

Message Flow

  1. User sends message via chat interface
  2. Frontend sends request to backend
  3. Orchestrator selects appropriate agent
  4. Agent processes with LLM + tools
  5. Response streams back in real-time

Streaming Responses

Agents support real-time streaming using Server-Sent Events (SSE):

// Streaming response example
for await (const chunk of stream) {
  yield {
    type: 'text-delta',
    textDelta: chunk.content
  }
}

Standard Agent

Purpose: General-purpose conversational AI for Q&A without external tools.

Features

  • Direct LLM interaction
  • No document search or external APIs
  • Model-agnostic (uses runtime context model)
  • Fast response times

System Prompt

You are a helpful assistant. Provide direct, clear answers to user questions.

Use Cases

  • General Q&A
  • Code generation
  • Text summarization
  • Creative writing

RAG Agent

Purpose: Document search and retrieval-augmented generation using vector similarity.

Features

  • Vector similarity search for semantic retrieval
  • Conversation-scoped document search
  • Citation tracking (##n$$ format)
  • Chunk reference system

Tools

  • vectorQueryTool: Search documents by semantic similarity

Usage Example

const response = await ragAgent.execute(
  messages,
  {
    userId: 'user-123',
    conversationId: 'conv-456',
    modelKey: 'openai:gpt-4o'
  },
  [vectorQueryTool]
)

Citation Format

The RAG agent includes citations in responses:

According to the documentation ##1$$, ExtendedLM supports multiple LLM providers.

Citations link to specific chunks in the source documents.

Global RAG Agent

Purpose: Search across all user documents (not limited to conversation).

Features

  • User-wide document search
  • Valkey-based caching (configurable TTL)
  • Hybrid search (vector + text)
  • Configurable in global-rag-settings.json

Configuration

{
  "ragModelKey": "openai:gpt-4o",
  "embedding": {
    "active": "ollama",
    "providers": {
      "ollama": {
        "modelName": "mxbai-embed-large:latest"
      }
    }
  }
}

Caching

Global RAG uses Valkey cache with 600s TTL (default). Cache keys are based on query hash.

Translation Agent

Purpose: English ↔ Japanese translation using PLaMo-2-translate model.

Features

  • Dedicated translation model (PLaMo-2)
  • Auto language detection
  • Low temperature (0.0) for deterministic results
  • Gateway llama.cpp integration

Model Configuration

{
  "key": "gateway:plamo-2-translate",
  "provider": "gateway",
  "modelName": "PLaMo-2-translate-Q4_0",
  "temperature": 0.0
}

Usage

User: Translate to Japanese: Hello, how are you?
Agent: こんにちは、お元気ですか?

Weather Agent

Purpose: Provide current weather information for any location.

Features

  • Open-Meteo API integration (no API key required)
  • Temperature, humidity, wind, precipitation
  • Location translation support
  • Natural language queries

Tools

  • weatherTool: Query weather by location name or coordinates

Example Query

User: What's the weather in Tokyo?
Agent: Currently in Tokyo:
- Temperature: 15°C
- Conditions: Partly cloudy
- Humidity: 65%
- Wind: 12 km/h NE

Computer Use Agent

Purpose: AI-powered computer automation via Mate backend.

Features

  • Browser automation (Playwright)
  • File operations (read/write/execute)
  • Shell command execution
  • VNC real-time monitoring
  • Secure sandbox environment (Docker)

Architecture

Computer Use agent communicates with Mate (Python FastAPI backend):

  • Frontend: ExtendedLM chat UI
  • Backend: Mate (port 8000)
  • Sandbox: Docker containers per session
  • Browser: Chrome + CDP protocol

Capabilities

  • Navigate websites
  • Fill forms and click buttons
  • Extract data from pages
  • Execute Python scripts
  • Perform web searches (Bing)

Learn more about Computer Use →

MCP Agent

Purpose: Model Context Protocol integration for dynamic tool loading.

Features

  • Dynamic tool discovery from MCP servers
  • Multi-server support (SwitchBot, cal2prompt, jgrants)
  • Natural language → MCP tool invocation
  • Extensible architecture

Configured MCP Servers

  • cal2prompt: Calendar integration
  • SwitchBot: Smart home device control
  • jgrants: Grant management system

Example Usage

User: Turn on the living room light
Agent: [Uses SwitchBot MCP server to control smart light]
Response: Living room light has been turned on.

Learn more about MCP Integration →

PDF Reading Agent

Purpose: PDF document analysis with text extraction and OCR.

Features

  • PDF text extraction (pdf-parse)
  • OCR support (Tesseract.js)
  • Page-by-page processing
  • Image extraction from PDFs

Processing Pipeline

  1. Extract text from PDF using pdf-parse
  2. If text is sparse, perform OCR on page images
  3. Extract images and analyze with vision LLM
  4. Combine text + image descriptions

Report Agent

Purpose: Generate structured reports from conversation or documents.

Features

  • Structured report generation
  • Multiple report formats
  • Summary + detailed sections
  • Markdown output

Slide Assistant Agent

Purpose: Generate HTML presentations from natural language instructions.

Features

  • Multiple themes (default, corporate, creative)
  • Template system
  • Interactive HTML slides
  • Export-ready presentations

Tools

  • generateHtmlPresentationTool: Create HTML slides from outline

Example

User: Create a presentation about ExtendedLM with 5 slides
Agent: [Generates HTML presentation with title, features, architecture, use cases, conclusion]

Browser Agent

Purpose: Web automation using Playwright.

Features

  • Page navigation
  • Element interaction (click, type, select)
  • Screenshot capture
  • Data extraction

Tools

  • playwrightTools: Browser automation primitives

Image Creator Agent

Purpose: AI image generation from text prompts.

Features

  • Text-to-image generation
  • Multiple styles and formats
  • Image enhancement
  • Prompt optimization

Multi-Agent Orchestration

File: src/server/chat/orchestrator.ts

Orchestrator Responsibilities

  • Route messages to appropriate agents
  • Handle tool execution results
  • Manage conversation persistence
  • Extract and store citations
  • Error handling and recovery
  • Stream responses via SSE

Agent Selection

The orchestrator selects agents based on:

  • User's explicit agent choice (UI selection)
  • Conversation context
  • Required tools/capabilities

Creating Custom Agents

Step 1: Create Agent File

Create a new agent file:

// my-custom-agent.ts
import { BaseAgent } from '../executor/BaseAgent'

export class MyCustomAgent extends BaseAgent {
  async execute(messages, runtimeContext, tools, memory) {
    // Your agent logic here
    return {
      type: 'text',
      content: 'Agent response'
    }
  }
}

Step 2: Register Agent

Register your agent:

import { MyCustomAgent } from './my-custom-agent'

export const agents = {
  standard: standardAgent,
  rag: ragAgent,
  myCustom: new MyCustomAgent() // Add here
}

Step 3: Add UI Option

Update agent selector in chat UI to include your new agent.

Step 4: Create Tools (Optional)

If your agent needs tools, create custom tool definitions:

export const myCustomTool = tool({
  description: 'Does something useful',
  parameters: z.object({
    input: z.string()
  }),
  execute: async ({ input }) => {
    // Tool logic
    return result
  }
})