Agent System

ExtendedLM includes 12 specialized agents, each designed for specific tasks with unique capabilities and tools.

What are Agents?

Agents are specialized AI assistants built on top of LLMs with access to specific tools and capabilities. Each agent is optimized for particular tasks, from document search to browser automation.

Available Agents

Standard Agent

General Q&A

RAG Agent

Document search

Global RAG

Cross-user search

Translation

EN/JP translation

Weather

Weather info

Computer Use

System automation

MCP Agent

Dynamic tools

PDF Reading

PDF analysis

Report

Report generation

Slide Assistant

Presentations

Browser

Web automation

Image Creator

Image generation

Agent Architecture

Base Agent Class

All agents extend the base BaseAgent class:

abstract class BaseAgent {
  abstract execute(
    messages: Message[],
    runtimeContext: RuntimeContext,
    tools?: Tool[],
    memory?: Memory
  ): Promise<AgentResponse>
}

Runtime Context

Agents receive a runtime context with:

modelKey: Selected LLM model
userId: Current user ID
conversationId: Active conversation
abortSignal: Cancellation signal
metadata: Additional context

Agent Tools

Agents use tools to perform specific actions:

vectorQueryTool: RAG document search
weatherTool: Weather API queries
playwrightTools: Browser automation
generateHtmlPresentationTool: Slide creation

Agent Communication Protocol

Message Flow

User sends message via chat interface
Frontend sends request to backend
Orchestrator selects appropriate agent
Agent processes with LLM + tools
Response streams back in real-time

Streaming Responses

Agents support real-time streaming using Server-Sent Events (SSE):

// Streaming response example
for await (const chunk of stream) {
  yield {
    type: 'text-delta',
    textDelta: chunk.content
  }
}

Standard Agent

Purpose: General-purpose conversational AI for Q&A without external tools.

Features

Direct LLM interaction
No document search or external APIs
Model-agnostic (uses runtime context model)
Fast response times

System Prompt

You are a helpful assistant. Provide direct, clear answers to user questions.

Use Cases

General Q&A
Code generation
Text summarization
Creative writing

RAG Agent

Purpose: Document search and retrieval-augmented generation using vector similarity.

Features

Vector similarity search for semantic retrieval
Conversation-scoped document search
Citation tracking (##n$$ format)
Chunk reference system

Tools

vectorQueryTool: Search documents by semantic similarity

Usage Example

const response = await ragAgent.execute(
  messages,
  {
    userId: 'user-123',
    conversationId: 'conv-456',
    modelKey: 'openai:gpt-4o'
  },
  [vectorQueryTool]
)

Citation Format

The RAG agent includes citations in responses:

According to the documentation ##1$$, ExtendedLM supports multiple LLM providers.

Citations link to specific chunks in the source documents.

Global RAG Agent

Purpose: Search across all user documents (not limited to conversation).

Features

User-wide document search
Valkey-based caching (configurable TTL)
Hybrid search (vector + text)
Configurable in global-rag-settings.json

Configuration

{
  "ragModelKey": "openai:gpt-4o",
  "embedding": {
    "active": "openai",
    "providers": {
      "openai": {
        "modelName": "text-embedding-3-small"
      }
    }
  }
}

Caching

Global RAG uses Valkey cache with 600s TTL (default). Cache keys are based on query hash.

Translation Agent

Purpose: English ↔ Japanese translation using PLaMo-2-translate model.

Features

Dedicated translation model (PLaMo-2)
Auto language detection
Low temperature (0.0) for deterministic results
Gateway llama.cpp integration

Model Configuration

{
  "key": "gateway:plamo-2-translate",
  "provider": "gateway",
  "modelName": "PLaMo-2-translate-Q4_0",
  "temperature": 0.0
}

Usage

User: Translate to Japanese: Hello, how are you?
Agent: こんにちは、お元気ですか？

Weather Agent

Purpose: Provide current weather information for any location.

Features

Open-Meteo API integration (no API key required)
Temperature, humidity, wind, precipitation
Location translation support
Natural language queries

Tools

weatherTool: Query weather by location name or coordinates

Example Query

User: What's the weather in Tokyo?
Agent: Currently in Tokyo:
- Temperature: 15°C
- Conditions: Partly cloudy
- Humidity: 65%
- Wind: 12 km/h NE

Computer Use Agent

Purpose: AI-powered computer automation via Mate backend.

Features

Browser automation (Playwright)
File operations (read/write/execute)
Shell command execution
VNC real-time monitoring
Secure sandbox environment (Docker)

Architecture

Computer Use agent communicates with Mate (Python FastAPI backend):

Frontend: ExtendedLM chat UI
Backend: Mate (port 8000)
Sandbox: Docker containers per session
Browser: Chrome + CDP protocol

Capabilities

Navigate websites
Fill forms and click buttons
Extract data from pages
Execute Python scripts
Perform web searches (Bing)

Learn more about Computer Use →

MCP Agent

Purpose: Model Context Protocol integration for dynamic tool loading.

Features

Dynamic tool discovery from MCP servers
Multi-server support (SwitchBot, cal2prompt, jgrants)
Natural language → MCP tool invocation
Extensible architecture

Configured MCP Servers

cal2prompt: Calendar integration
SwitchBot: Smart home device control
jgrants: Grant management system

Example Usage

User: Turn on the living room light
Agent: [Uses SwitchBot MCP server to control smart light]
Response: Living room light has been turned on.

Learn more about MCP Integration →

PDF Reading Agent

Purpose: PDF document analysis with text extraction and OCR.

Features

PDF text extraction (pdf-parse)
OCR support (Tesseract.js)
Page-by-page processing
Image extraction from PDFs

Processing Pipeline

Extract text from PDF using pdf-parse
If text is sparse, perform OCR on page images
Extract images and analyze with vision LLM
Combine text + image descriptions

Report Agent

Purpose: Generate structured reports from conversation or documents.

Features

Structured report generation
Multiple report formats
Summary + detailed sections
Markdown output

Slide Assistant Agent

Purpose: Generate HTML presentations from natural language instructions.

Features

Multiple themes (default, corporate, creative)
Template system
Interactive HTML slides
Export-ready presentations

Tools

generateHtmlPresentationTool: Create HTML slides from outline

Example

User: Create a presentation about ExtendedLM with 5 slides
Agent: [Generates HTML presentation with title, features, architecture, use cases, conclusion]

Browser Agent

Purpose: Web automation using Playwright.

Features

Page navigation
Element interaction (click, type, select)
Screenshot capture
Data extraction

Tools

playwrightTools: Browser automation primitives

Image Creator Agent

Purpose: AI image generation from text prompts.

Features

Text-to-image generation
Multiple styles and formats
Image enhancement
Prompt optimization

Multi-Agent Orchestration

File: src/server/chat/orchestrator.ts

Orchestrator Responsibilities

Route messages to appropriate agents
Handle tool execution results
Manage conversation persistence
Extract and store citations
Error handling and recovery
Stream responses via SSE

Agent Selection

The orchestrator selects agents based on:

User's explicit agent choice (UI selection)
Conversation context
Required tools/capabilities

Creating Custom Agents

Step 1: Create Agent File

Create a new agent file:

// my-custom-agent.ts
import { BaseAgent } from '../executor/BaseAgent'

export class MyCustomAgent extends BaseAgent {
  async execute(messages, runtimeContext, tools, memory) {
    // Your agent logic here
    return {
      type: 'text',
      content: 'Agent response'
    }
  }
}

Step 2: Register Agent

Register your agent:

import { MyCustomAgent } from './my-custom-agent'

export const agents = {
  standard: standardAgent,
  rag: ragAgent,
  myCustom: new MyCustomAgent() // Add here
}

Step 3: Add UI Option

Update agent selector in chat UI to include your new agent.

Step 4: Create Tools (Optional)

If your agent needs tools, create custom tool definitions:

export const myCustomTool = tool({
  description: 'Does something useful',
  parameters: z.object({
    input: z.string()
  }),
  execute: async ({ input }) => {
    // Tool logic
    return result
  }
})