Computer Use & Automation

ExtendedLM Computer Use provides AI-powered browser automation, file operations, and shell execution via Mate backend.

What is Computer Use?

Computer Use allows AI agents to control a computer through a web browser, execute commands, read/write files, and perform complex multi-step automation tasks. It runs in a secure Docker sandbox for safety.

Key Features

Browser Automation: Playwright-based web navigation and interaction
File Operations: Read, write, and execute files
Shell Commands: Execute bash commands in sandbox
VNC Monitoring: Real-time screen viewing
Python Execution: Run Python scripts in sandbox
Web Search: Bing search integration (CAPTCHA-resistant)
Secure Sandbox: Docker containers per session
AI Planning: GPT-4 agent planning and execution

Use Cases

Web scraping and data extraction
Form filling and submission
E2E testing automation
Research and information gathering
Report generation from web sources
Automated workflows with external services

Architecture

Components

ExtendedLM Frontend: Chat UI with Computer Use panel
Computer Use Agent: ExtendedLM agent (computer-use-agent.ts)
Mate Backend: Python FastAPI server
Docker Sandbox: Isolated container per session
Chrome Browser: Chromium + CDP protocol
VNC Server: XVFB + x11vnc for screen sharing

Data Flow

User sends task to Computer Use agent in ExtendedLM
Agent creates session with Mate
Mate spawns Docker sandbox container
Agent sends commands to Mate API
Mate executes in sandbox (browser/shell/file ops)
Results stream back to agent via SSE
VNC stream shows real-time screen
Agent formats results for user

Directory Structure

apps/Mate/
├── backend/                    # Python FastAPI backend
│   ├── main.py                # API server entry point
│   ├── agent.py               # AI agent logic
│   ├── browser.py             # Playwright automation
│   ├── filesystem.py          # File operations
│   ├── shell.py               # Shell command execution
│   └── vnc.py                 # VNC server management
├── docker-compose-supabase.yml
├── Dockerfile
├── start.sh                   # Startup script
└── .env.supabase              # Configuration

Browser Automation

Overview

Playwright-based browser automation with Chrome/Chromium.

Capabilities

Navigation: Visit URLs, follow links
Interaction: Click buttons, fill forms, select dropdowns
Data Extraction: Extract text, tables, lists from pages
Screenshots: Capture full page or elements
Waiting: Wait for elements, network idle, timeouts
Multiple Pages: Handle tabs and popups

Example Tasks

User: Go to example.com and extract all article titles

Agent:
1. Navigate to example.com
2. Wait for page load
3. Find all article title elements
4. Extract text from each
5. Return list of titles

Response:
Found 10 articles:
- "Introduction to AI"
- "Machine Learning Basics"
...

Browser Actions

# Navigate to URL
await page.goto('https://example.com')

# Click element
await page.click('button[type="submit"]')

# Fill input
await page.fill('input[name="email"]', 'user@example.com')

# Extract text
title = await page.text_content('h1')

# Screenshot
await page.screenshot(path='screenshot.png')

Selector Strategies

CSS: .class-name, #id, button[type="submit"]
Text: text="Click here"
XPath: //button[@class="submit"]
Role: role=button[name="Submit"]

File Operations

Overview

Read, write, and execute files within the sandbox.

Capabilities

Read Files: Read text files, JSON, CSV, etc.
Write Files: Create and write files
List Directory: List files and subdirectories
Delete Files: Remove files
Execute Scripts: Run Python, bash scripts

File System Structure

/home/agent/
├── workspace/          # Working directory
│   ├── downloads/     # Downloaded files
│   ├── scripts/       # User scripts
│   └── data/          # Data files
└── output/            # Output files

Example Operations

# Read file
with open('/home/agent/workspace/data.txt', 'r') as f:
    content = f.read()

# Write file
with open('/home/agent/workspace/output.json', 'w') as f:
    json.dump(data, f)

# List directory
files = os.listdir('/home/agent/workspace')

# Execute script
result = subprocess.run(['python', 'script.py'], capture_output=True)

File Upload/Download

Upload files from ExtendedLM to sandbox
Download files from sandbox to ExtendedLM
View files in Computer Use panel

Shell Command Execution

Overview

Execute bash commands within the sandbox container.

Capabilities

Run any bash command
Pipe commands
Environment variables
Background processes

Example Commands

# List files
ls -la /home/agent/workspace

# Install packages
pip install requests

# Run Python script
python script.py --input data.json

# Pipe commands
cat data.txt | grep "error" | wc -l

# Network request
curl https://api.example.com/data

Security Restrictions

No sudo access
Network isolated (except allowed domains)
Resource limits (CPU, memory, disk)
Command whitelist/blacklist (configurable)

Command Output

All command output (stdout/stderr) is captured and returned to agent:

{
  "stdout": "command output...",
  "stderr": "",
  "exit_code": 0,
  "execution_time_ms": 123
}

VNC Monitoring

Overview

Real-time screen viewing via VNC protocol.

Setup

Display Server: XVFB (virtual framebuffer)
VNC Server: x11vnc
Resolution: 1920x1080 (configurable)
Frame Rate: 10-30 FPS (adjustable)

Access VNC

VNC connection info is provided when session starts:

{
  "vnc_url": "vnc://localhost:5900",
  "vnc_password": "session_password",
  "http_viewer": "http://localhost:6080/vnc.html"
}

VNC Viewer

ExtendedLM includes a built-in VNC viewer in the Computer Use panel:

Real-time screen updates
Mouse/keyboard interaction (optional)
Screenshot capture
Recording (future)

Configuration

# .env.supabase
VNC_DISPLAY=:99
VNC_RESOLUTION=1920x1080
VNC_COLOR_DEPTH=24

Python Execution

Overview

Execute Python scripts within the sandbox.

Installed Packages

Standard Library (full)
requests, httpx (HTTP clients)
beautifulsoup4, lxml (HTML parsing)
pandas, numpy (data processing)
playwright (browser automation)
pillow (image processing)

Example: Web Scraping Script

import requests
from bs4 import BeautifulSoup

# Fetch page
response = requests.get('https://example.com')
soup = BeautifulSoup(response.text, 'html.parser')

# Extract data
articles = []
for article in soup.find_all('article'):
    title = article.find('h2').text
    link = article.find('a')['href']
    articles.append({'title': title, 'link': link})

# Save results
import json
with open('articles.json', 'w') as f:
    json.dump(articles, f, indent=2)

print(f"Extracted {len(articles)} articles")

Example: Data Processing

import pandas as pd

# Load data
df = pd.read_csv('data.csv')

# Process
df['total'] = df['price'] * df['quantity']
summary = df.groupby('category')['total'].sum()

# Save
summary.to_csv('summary.csv')
print(summary)

Install Additional Packages

pip install package-name

Installation

Prerequisites

Docker and Docker Compose
Python 3.11+
4GB+ available RAM

Setup Mate

# Navigate to Mate directory
cd apps/Mate

# Copy environment template
cp .env.example .env.supabase

# Edit configuration
nano .env.supabase

# Start services
./setup/start.sh

Verify Installation

# Check Mate status
curl http://localhost:8000/health

# Expected response:
{"status": "ok", "version": "1.0.0"}

Docker Services

The start.sh script starts:

backend: FastAPI server (port 8000)
supabase-db: PostgreSQL (port 5432)
storage: Supabase Storage (port 5000)
valkey: Redis-compatible cache (port 6379)

Configuration

Mate Configuration

File: apps/Mate/.env.supabase

# Server
PORT=8000
HOST=0.0.0.0

# OpenAI API (for AI agent)
OPENAI_API_KEY=sk-proj-...

# Supabase
SUPABASE_URL=http://supabase-db:5432
SUPABASE_SERVICE_KEY=...

# Sandbox
SANDBOX_TIMEOUT_SECONDS=300
SANDBOX_MAX_MEMORY_MB=2048
SANDBOX_MAX_CPU_CORES=2

# VNC
VNC_DISPLAY=:99
VNC_RESOLUTION=1920x1080
VNC_PASSWORD=changeme

# Browser
BROWSER_HEADLESS=false
BROWSER_TIMEOUT_MS=30000

# Security
ALLOWED_DOMAINS=["example.com", "api.example.com"]
BLOCKED_COMMANDS=["rm -rf /", "shutdown"]

Resource Limits

# docker-compose-supabase.yml
services:
  backend:
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G
        reservations:
          cpus: '1'
          memory: 1G

Security Considerations

Sandbox Isolation

Each session runs in isolated Docker container
No access to host file system
Network restricted to allowed domains
Resource limits prevent DoS

Command Filtering

Blacklist dangerous commands
Whitelist allowed operations
No sudo/root access
Command logging for audit

Network Security

Outbound connections to allowed domains only
No inbound connections to sandbox
VNC password protected
Session timeout (5 minutes default)

Data Protection

Sandbox destroyed after session ends
No persistent storage between sessions
Sensitive data should not be stored in sandbox

Best Practices

Use unique API keys per environment
Limit session duration
Monitor resource usage
Review logs regularly
Keep Docker images updated

Mate API Reference

Create Session

Endpoint: POST /api/computer-use/session

{
  "user_id": "user_123"
}

Response:

{
  "session_id": "sess_abc",
  "vnc_url": "http://localhost:6080",
  "vnc_password": "pass123"
}

Send Message

Endpoint: POST /api/computer-use/chat

{
  "session_id": "sess_abc",
  "message": "Go to example.com and extract all links"
}

Response: SSE stream of agent actions and results

Get VNC Info

Endpoint: GET /api/computer-use/vnc-info

{
  "session_id": "sess_abc",
  "vnc_url": "http://localhost:6080",
  "display": ":99"
}

File Operations

Upload: POST /api/computer-use/filesystem/upload

Download: GET /api/computer-use/filesystem/download

List: GET /api/computer-use/filesystem/list

Execute Command

Endpoint: POST /api/computer-use/shell

{
  "session_id": "sess_abc",
  "command": "ls -la"
}

Destroy Session

Endpoint: DELETE /api/computer-use/session

{
  "session_id": "sess_abc"
}