Computer Use & Automation
ExtendedLM Computer Use provides AI-powered browser automation, file operations, and shell execution via Mate backend.
Computer Use allows AI agents to control a computer through a web browser, execute commands, read/write files, and perform complex multi-step automation tasks. It runs in a secure Docker sandbox for safety.
Key Features
- Browser Automation: Playwright-based web navigation and interaction
- File Operations: Read, write, and execute files
- Shell Commands: Execute bash commands in sandbox
- VNC Monitoring: Real-time screen viewing
- Python Execution: Run Python scripts in sandbox
- Web Search: Bing search integration (CAPTCHA-resistant)
- Secure Sandbox: Docker containers per session
- AI Planning: GPT-4 agent planning and execution
Use Cases
- Web scraping and data extraction
- Form filling and submission
- E2E testing automation
- Research and information gathering
- Report generation from web sources
- Automated workflows with external services
Architecture
Components
- ExtendedLM Frontend: Chat UI with Computer Use panel
- Computer Use Agent: ExtendedLM agent (
computer-use-agent.ts) - Mate Backend: Python FastAPI server
- Docker Sandbox: Isolated container per session
- Chrome Browser: Chromium + CDP protocol
- VNC Server: XVFB + x11vnc for screen sharing
Data Flow
- User sends task to Computer Use agent in ExtendedLM
- Agent creates session with Mate
- Mate spawns Docker sandbox container
- Agent sends commands to Mate API
- Mate executes in sandbox (browser/shell/file ops)
- Results stream back to agent via SSE
- VNC stream shows real-time screen
- Agent formats results for user
Directory Structure
apps/Mate/
├── backend/ # Python FastAPI backend
│ ├── main.py # API server entry point
│ ├── agent.py # AI agent logic
│ ├── browser.py # Playwright automation
│ ├── filesystem.py # File operations
│ ├── shell.py # Shell command execution
│ └── vnc.py # VNC server management
├── docker-compose-supabase.yml
├── Dockerfile
├── start.sh # Startup script
└── .env.supabase # Configuration
Browser Automation
Overview
Playwright-based browser automation with Chrome/Chromium.
Capabilities
- Navigation: Visit URLs, follow links
- Interaction: Click buttons, fill forms, select dropdowns
- Data Extraction: Extract text, tables, lists from pages
- Screenshots: Capture full page or elements
- Waiting: Wait for elements, network idle, timeouts
- Multiple Pages: Handle tabs and popups
Example Tasks
User: Go to example.com and extract all article titles
Agent:
1. Navigate to example.com
2. Wait for page load
3. Find all article title elements
4. Extract text from each
5. Return list of titles
Response:
Found 10 articles:
- "Introduction to AI"
- "Machine Learning Basics"
...
Browser Actions
# Navigate to URL
await page.goto('https://example.com')
# Click element
await page.click('button[type="submit"]')
# Fill input
await page.fill('input[name="email"]', 'user@example.com')
# Extract text
title = await page.text_content('h1')
# Screenshot
await page.screenshot(path='screenshot.png')
Selector Strategies
- CSS:
.class-name,#id,button[type="submit"] - Text:
text="Click here" - XPath:
//button[@class="submit"] - Role:
role=button[name="Submit"]
File Operations
Overview
Read, write, and execute files within the sandbox.
Capabilities
- Read Files: Read text files, JSON, CSV, etc.
- Write Files: Create and write files
- List Directory: List files and subdirectories
- Delete Files: Remove files
- Execute Scripts: Run Python, bash scripts
File System Structure
/home/agent/
├── workspace/ # Working directory
│ ├── downloads/ # Downloaded files
│ ├── scripts/ # User scripts
│ └── data/ # Data files
└── output/ # Output files
Example Operations
# Read file
with open('/home/agent/workspace/data.txt', 'r') as f:
content = f.read()
# Write file
with open('/home/agent/workspace/output.json', 'w') as f:
json.dump(data, f)
# List directory
files = os.listdir('/home/agent/workspace')
# Execute script
result = subprocess.run(['python', 'script.py'], capture_output=True)
File Upload/Download
- Upload files from ExtendedLM to sandbox
- Download files from sandbox to ExtendedLM
- View files in Computer Use panel
Shell Command Execution
Overview
Execute bash commands within the sandbox container.
Capabilities
- Run any bash command
- Pipe commands
- Environment variables
- Background processes
Example Commands
# List files
ls -la /home/agent/workspace
# Install packages
pip install requests
# Run Python script
python script.py --input data.json
# Pipe commands
cat data.txt | grep "error" | wc -l
# Network request
curl https://api.example.com/data
Security Restrictions
- No sudo access
- Network isolated (except allowed domains)
- Resource limits (CPU, memory, disk)
- Command whitelist/blacklist (configurable)
Command Output
All command output (stdout/stderr) is captured and returned to agent:
{
"stdout": "command output...",
"stderr": "",
"exit_code": 0,
"execution_time_ms": 123
}
VNC Monitoring
Overview
Real-time screen viewing via VNC protocol.
Setup
- Display Server: XVFB (virtual framebuffer)
- VNC Server: x11vnc
- Resolution: 1920x1080 (configurable)
- Frame Rate: 10-30 FPS (adjustable)
Access VNC
VNC connection info is provided when session starts:
{
"vnc_url": "vnc://localhost:5900",
"vnc_password": "session_password",
"http_viewer": "http://localhost:6080/vnc.html"
}
VNC Viewer
ExtendedLM includes a built-in VNC viewer in the Computer Use panel:
- Real-time screen updates
- Mouse/keyboard interaction (optional)
- Screenshot capture
- Recording (future)
Configuration
# .env.supabase
VNC_DISPLAY=:99
VNC_RESOLUTION=1920x1080
VNC_COLOR_DEPTH=24
Python Execution
Overview
Execute Python scripts within the sandbox.
Installed Packages
- Standard Library (full)
- requests, httpx (HTTP clients)
- beautifulsoup4, lxml (HTML parsing)
- pandas, numpy (data processing)
- playwright (browser automation)
- pillow (image processing)
Example: Web Scraping Script
import requests
from bs4 import BeautifulSoup
# Fetch page
response = requests.get('https://example.com')
soup = BeautifulSoup(response.text, 'html.parser')
# Extract data
articles = []
for article in soup.find_all('article'):
title = article.find('h2').text
link = article.find('a')['href']
articles.append({'title': title, 'link': link})
# Save results
import json
with open('articles.json', 'w') as f:
json.dump(articles, f, indent=2)
print(f"Extracted {len(articles)} articles")
Example: Data Processing
import pandas as pd
# Load data
df = pd.read_csv('data.csv')
# Process
df['total'] = df['price'] * df['quantity']
summary = df.groupby('category')['total'].sum()
# Save
summary.to_csv('summary.csv')
print(summary)
Install Additional Packages
pip install package-name
Installation
Prerequisites
- Docker and Docker Compose
- Python 3.11+
- 4GB+ available RAM
Setup Mate
# Navigate to Mate directory
cd apps/Mate
# Copy environment template
cp .env.example .env.supabase
# Edit configuration
nano .env.supabase
# Start services
./setup/start.sh
Verify Installation
# Check Mate status
curl http://localhost:8000/health
# Expected response:
{"status": "ok", "version": "1.0.0"}
Docker Services
The start.sh script starts:
- backend: FastAPI server (port 8000)
- supabase-db: PostgreSQL (port 5432)
- storage: Supabase Storage (port 5000)
- valkey: Redis-compatible cache (port 6379)
Configuration
Mate Configuration
File: apps/Mate/.env.supabase
# Server
PORT=8000
HOST=0.0.0.0
# OpenAI API (for AI agent)
OPENAI_API_KEY=sk-proj-...
# Supabase
SUPABASE_URL=http://supabase-db:5432
SUPABASE_SERVICE_KEY=...
# Sandbox
SANDBOX_TIMEOUT_SECONDS=300
SANDBOX_MAX_MEMORY_MB=2048
SANDBOX_MAX_CPU_CORES=2
# VNC
VNC_DISPLAY=:99
VNC_RESOLUTION=1920x1080
VNC_PASSWORD=changeme
# Browser
BROWSER_HEADLESS=false
BROWSER_TIMEOUT_MS=30000
# Security
ALLOWED_DOMAINS=["example.com", "api.example.com"]
BLOCKED_COMMANDS=["rm -rf /", "shutdown"]
Resource Limits
# docker-compose-supabase.yml
services:
backend:
deploy:
resources:
limits:
cpus: '2'
memory: 2G
reservations:
cpus: '1'
memory: 1G
Security Considerations
Sandbox Isolation
- Each session runs in isolated Docker container
- No access to host file system
- Network restricted to allowed domains
- Resource limits prevent DoS
Command Filtering
- Blacklist dangerous commands
- Whitelist allowed operations
- No sudo/root access
- Command logging for audit
Network Security
- Outbound connections to allowed domains only
- No inbound connections to sandbox
- VNC password protected
- Session timeout (5 minutes default)
Data Protection
- Sandbox destroyed after session ends
- No persistent storage between sessions
- Sensitive data should not be stored in sandbox
Best Practices
- Use unique API keys per environment
- Limit session duration
- Monitor resource usage
- Review logs regularly
- Keep Docker images updated
Mate API Reference
Create Session
Endpoint: POST /api/computer-use/session
{
"user_id": "user_123"
}
Response:
{
"session_id": "sess_abc",
"vnc_url": "http://localhost:6080",
"vnc_password": "pass123"
}
Send Message
Endpoint: POST /api/computer-use/chat
{
"session_id": "sess_abc",
"message": "Go to example.com and extract all links"
}
Response: SSE stream of agent actions and results
Get VNC Info
Endpoint: GET /api/computer-use/vnc-info
{
"session_id": "sess_abc",
"vnc_url": "http://localhost:6080",
"display": ":99"
}
File Operations
Upload: POST /api/computer-use/filesystem/upload
Download: GET /api/computer-use/filesystem/download
List: GET /api/computer-use/filesystem/list
Execute Command
Endpoint: POST /api/computer-use/shell
{
"session_id": "sess_abc",
"command": "ls -la"
}
Destroy Session
Endpoint: DELETE /api/computer-use/session
{
"session_id": "sess_abc"
}