Agent System
ExtendedLM includes 12 specialized agents, each designed for specific tasks with unique capabilities and tools.
Agents are specialized AI assistants built on top of LLMs with access to specific tools and capabilities. Each agent is optimized for particular tasks, from document search to browser automation.
Available Agents
Standard Agent
General Q&A
RAG Agent
Document search
Global RAG
Cross-user search
Translation
EN/JP translation
Weather
Weather info
Computer Use
System automation
MCP Agent
Dynamic tools
PDF Reading
PDF analysis
Report
Report generation
Slide Assistant
Presentations
Browser
Web automation
Image Creator
Image generation
Agent Architecture
Base Agent Class
All agents extend the base BaseAgent class:
abstract class BaseAgent {
abstract execute(
messages: Message[],
runtimeContext: RuntimeContext,
tools?: Tool[],
memory?: Memory
): Promise<AgentResponse>
}
Runtime Context
Agents receive a runtime context with:
- modelKey: Selected LLM model
- userId: Current user ID
- conversationId: Active conversation
- abortSignal: Cancellation signal
- metadata: Additional context
Agent Tools
Agents use tools to perform specific actions:
- vectorQueryTool: RAG document search
- weatherTool: Weather API queries
- playwrightTools: Browser automation
- generateHtmlPresentationTool: Slide creation
Agent Communication Protocol
Message Flow
- User sends message via chat interface
- Frontend sends request to backend
- Orchestrator selects appropriate agent
- Agent processes with LLM + tools
- Response streams back in real-time
Streaming Responses
Agents support real-time streaming using Server-Sent Events (SSE):
// Streaming response example
for await (const chunk of stream) {
yield {
type: 'text-delta',
textDelta: chunk.content
}
}
Standard Agent
Purpose: General-purpose conversational AI for Q&A without external tools.
Features
- Direct LLM interaction
- No document search or external APIs
- Model-agnostic (uses runtime context model)
- Fast response times
System Prompt
You are a helpful assistant. Provide direct, clear answers to user questions.
Use Cases
- General Q&A
- Code generation
- Text summarization
- Creative writing
RAG Agent
Purpose: Document search and retrieval-augmented generation using vector similarity.
Features
- Vector similarity search for semantic retrieval
- Conversation-scoped document search
- Citation tracking (##n$$ format)
- Chunk reference system
Tools
- vectorQueryTool: Search documents by semantic similarity
Usage Example
const response = await ragAgent.execute(
messages,
{
userId: 'user-123',
conversationId: 'conv-456',
modelKey: 'openai:gpt-4o'
},
[vectorQueryTool]
)
Citation Format
The RAG agent includes citations in responses:
According to the documentation ##1$$, ExtendedLM supports multiple LLM providers.
Citations link to specific chunks in the source documents.
Global RAG Agent
Purpose: Search across all user documents (not limited to conversation).
Features
- User-wide document search
- Valkey-based caching (configurable TTL)
- Hybrid search (vector + text)
- Configurable in
global-rag-settings.json
Configuration
{
"ragModelKey": "openai:gpt-4o",
"embedding": {
"active": "ollama",
"providers": {
"ollama": {
"modelName": "mxbai-embed-large:latest"
}
}
}
}
Caching
Global RAG uses Valkey cache with 600s TTL (default). Cache keys are based on query hash.
Translation Agent
Purpose: English ↔ Japanese translation using PLaMo-2-translate model.
Features
- Dedicated translation model (PLaMo-2)
- Auto language detection
- Low temperature (0.0) for deterministic results
- Gateway llama.cpp integration
Model Configuration
{
"key": "gateway:plamo-2-translate",
"provider": "gateway",
"modelName": "PLaMo-2-translate-Q4_0",
"temperature": 0.0
}
Usage
User: Translate to Japanese: Hello, how are you?
Agent: こんにちは、お元気ですか?
Weather Agent
Purpose: Provide current weather information for any location.
Features
- Open-Meteo API integration (no API key required)
- Temperature, humidity, wind, precipitation
- Location translation support
- Natural language queries
Tools
- weatherTool: Query weather by location name or coordinates
Example Query
User: What's the weather in Tokyo?
Agent: Currently in Tokyo:
- Temperature: 15°C
- Conditions: Partly cloudy
- Humidity: 65%
- Wind: 12 km/h NE
Computer Use Agent
Purpose: AI-powered computer automation via Mate backend.
Features
- Browser automation (Playwright)
- File operations (read/write/execute)
- Shell command execution
- VNC real-time monitoring
- Secure sandbox environment (Docker)
Architecture
Computer Use agent communicates with Mate (Python FastAPI backend):
- Frontend: ExtendedLM chat UI
- Backend: Mate (port 8000)
- Sandbox: Docker containers per session
- Browser: Chrome + CDP protocol
Capabilities
- Navigate websites
- Fill forms and click buttons
- Extract data from pages
- Execute Python scripts
- Perform web searches (Bing)
MCP Agent
Purpose: Model Context Protocol integration for dynamic tool loading.
Features
- Dynamic tool discovery from MCP servers
- Multi-server support (SwitchBot, cal2prompt, jgrants)
- Natural language → MCP tool invocation
- Extensible architecture
Configured MCP Servers
- cal2prompt: Calendar integration
- SwitchBot: Smart home device control
- jgrants: Grant management system
Example Usage
User: Turn on the living room light
Agent: [Uses SwitchBot MCP server to control smart light]
Response: Living room light has been turned on.
PDF Reading Agent
Purpose: PDF document analysis with text extraction and OCR.
Features
- PDF text extraction (pdf-parse)
- OCR support (Tesseract.js)
- Page-by-page processing
- Image extraction from PDFs
Processing Pipeline
- Extract text from PDF using pdf-parse
- If text is sparse, perform OCR on page images
- Extract images and analyze with vision LLM
- Combine text + image descriptions
Report Agent
Purpose: Generate structured reports from conversation or documents.
Features
- Structured report generation
- Multiple report formats
- Summary + detailed sections
- Markdown output
Slide Assistant Agent
Purpose: Generate HTML presentations from natural language instructions.
Features
- Multiple themes (default, corporate, creative)
- Template system
- Interactive HTML slides
- Export-ready presentations
Tools
- generateHtmlPresentationTool: Create HTML slides from outline
Example
User: Create a presentation about ExtendedLM with 5 slides
Agent: [Generates HTML presentation with title, features, architecture, use cases, conclusion]
Browser Agent
Purpose: Web automation using Playwright.
Features
- Page navigation
- Element interaction (click, type, select)
- Screenshot capture
- Data extraction
Tools
- playwrightTools: Browser automation primitives
Image Creator Agent
Purpose: AI image generation from text prompts.
Features
- Text-to-image generation
- Multiple styles and formats
- Image enhancement
- Prompt optimization
Multi-Agent Orchestration
File: src/server/chat/orchestrator.ts
Orchestrator Responsibilities
- Route messages to appropriate agents
- Handle tool execution results
- Manage conversation persistence
- Extract and store citations
- Error handling and recovery
- Stream responses via SSE
Agent Selection
The orchestrator selects agents based on:
- User's explicit agent choice (UI selection)
- Conversation context
- Required tools/capabilities
Creating Custom Agents
Step 1: Create Agent File
Create a new agent file:
// my-custom-agent.ts
import { BaseAgent } from '../executor/BaseAgent'
export class MyCustomAgent extends BaseAgent {
async execute(messages, runtimeContext, tools, memory) {
// Your agent logic here
return {
type: 'text',
content: 'Agent response'
}
}
}
Step 2: Register Agent
Register your agent:
import { MyCustomAgent } from './my-custom-agent'
export const agents = {
standard: standardAgent,
rag: ragAgent,
myCustom: new MyCustomAgent() // Add here
}
Step 3: Add UI Option
Update agent selector in chat UI to include your new agent.
Step 4: Create Tools (Optional)
If your agent needs tools, create custom tool definitions:
export const myCustomTool = tool({
description: 'Does something useful',
parameters: z.object({
input: z.string()
}),
execute: async ({ input }) => {
// Tool logic
return result
}
})