Computer Use & Automation

ExtendedLM Computer Use provides AI-powered browser automation, file operations, and shell execution via Mate backend.

What is Computer Use?

Computer Use allows AI agents to control a computer through a web browser, execute commands, read/write files, and perform complex multi-step automation tasks. It runs in a secure Docker sandbox for safety.

Key Features

  • Browser Automation: Playwright-based web navigation and interaction
  • File Operations: Read, write, and execute files
  • Shell Commands: Execute bash commands in sandbox
  • VNC Monitoring: Real-time screen viewing
  • Python Execution: Run Python scripts in sandbox
  • Web Search: Bing search integration (CAPTCHA-resistant)
  • Secure Sandbox: Docker containers per session
  • AI Planning: GPT-4 agent planning and execution

Use Cases

  • Web scraping and data extraction
  • Form filling and submission
  • E2E testing automation
  • Research and information gathering
  • Report generation from web sources
  • Automated workflows with external services

Architecture

Components

  • ExtendedLM Frontend: Chat UI with Computer Use panel
  • Computer Use Agent: ExtendedLM agent (computer-use-agent.ts)
  • Mate Backend: Python FastAPI server
  • Docker Sandbox: Isolated container per session
  • Chrome Browser: Chromium + CDP protocol
  • VNC Server: XVFB + x11vnc for screen sharing

Data Flow

  1. User sends task to Computer Use agent in ExtendedLM
  2. Agent creates session with Mate
  3. Mate spawns Docker sandbox container
  4. Agent sends commands to Mate API
  5. Mate executes in sandbox (browser/shell/file ops)
  6. Results stream back to agent via SSE
  7. VNC stream shows real-time screen
  8. Agent formats results for user

Directory Structure

apps/Mate/
├── backend/                    # Python FastAPI backend
│   ├── main.py                # API server entry point
│   ├── agent.py               # AI agent logic
│   ├── browser.py             # Playwright automation
│   ├── filesystem.py          # File operations
│   ├── shell.py               # Shell command execution
│   └── vnc.py                 # VNC server management
├── docker-compose-supabase.yml
├── Dockerfile
├── start.sh                   # Startup script
└── .env.supabase              # Configuration

Browser Automation

Overview

Playwright-based browser automation with Chrome/Chromium.

Capabilities

  • Navigation: Visit URLs, follow links
  • Interaction: Click buttons, fill forms, select dropdowns
  • Data Extraction: Extract text, tables, lists from pages
  • Screenshots: Capture full page or elements
  • Waiting: Wait for elements, network idle, timeouts
  • Multiple Pages: Handle tabs and popups

Example Tasks

User: Go to example.com and extract all article titles

Agent:
1. Navigate to example.com
2. Wait for page load
3. Find all article title elements
4. Extract text from each
5. Return list of titles

Response:
Found 10 articles:
- "Introduction to AI"
- "Machine Learning Basics"
...

Browser Actions

# Navigate to URL
await page.goto('https://example.com')

# Click element
await page.click('button[type="submit"]')

# Fill input
await page.fill('input[name="email"]', 'user@example.com')

# Extract text
title = await page.text_content('h1')

# Screenshot
await page.screenshot(path='screenshot.png')

Selector Strategies

  • CSS: .class-name, #id, button[type="submit"]
  • Text: text="Click here"
  • XPath: //button[@class="submit"]
  • Role: role=button[name="Submit"]

File Operations

Overview

Read, write, and execute files within the sandbox.

Capabilities

  • Read Files: Read text files, JSON, CSV, etc.
  • Write Files: Create and write files
  • List Directory: List files and subdirectories
  • Delete Files: Remove files
  • Execute Scripts: Run Python, bash scripts

File System Structure

/home/agent/
├── workspace/          # Working directory
│   ├── downloads/     # Downloaded files
│   ├── scripts/       # User scripts
│   └── data/          # Data files
└── output/            # Output files

Example Operations

# Read file
with open('/home/agent/workspace/data.txt', 'r') as f:
    content = f.read()

# Write file
with open('/home/agent/workspace/output.json', 'w') as f:
    json.dump(data, f)

# List directory
files = os.listdir('/home/agent/workspace')

# Execute script
result = subprocess.run(['python', 'script.py'], capture_output=True)

File Upload/Download

  • Upload files from ExtendedLM to sandbox
  • Download files from sandbox to ExtendedLM
  • View files in Computer Use panel

Shell Command Execution

Overview

Execute bash commands within the sandbox container.

Capabilities

  • Run any bash command
  • Pipe commands
  • Environment variables
  • Background processes

Example Commands

# List files
ls -la /home/agent/workspace

# Install packages
pip install requests

# Run Python script
python script.py --input data.json

# Pipe commands
cat data.txt | grep "error" | wc -l

# Network request
curl https://api.example.com/data

Security Restrictions

  • No sudo access
  • Network isolated (except allowed domains)
  • Resource limits (CPU, memory, disk)
  • Command whitelist/blacklist (configurable)

Command Output

All command output (stdout/stderr) is captured and returned to agent:

{
  "stdout": "command output...",
  "stderr": "",
  "exit_code": 0,
  "execution_time_ms": 123
}

VNC Monitoring

Overview

Real-time screen viewing via VNC protocol.

Setup

  • Display Server: XVFB (virtual framebuffer)
  • VNC Server: x11vnc
  • Resolution: 1920x1080 (configurable)
  • Frame Rate: 10-30 FPS (adjustable)

Access VNC

VNC connection info is provided when session starts:

{
  "vnc_url": "vnc://localhost:5900",
  "vnc_password": "session_password",
  "http_viewer": "http://localhost:6080/vnc.html"
}

VNC Viewer

ExtendedLM includes a built-in VNC viewer in the Computer Use panel:

  • Real-time screen updates
  • Mouse/keyboard interaction (optional)
  • Screenshot capture
  • Recording (future)

Configuration

# .env.supabase
VNC_DISPLAY=:99
VNC_RESOLUTION=1920x1080
VNC_COLOR_DEPTH=24

Python Execution

Overview

Execute Python scripts within the sandbox.

Installed Packages

  • Standard Library (full)
  • requests, httpx (HTTP clients)
  • beautifulsoup4, lxml (HTML parsing)
  • pandas, numpy (data processing)
  • playwright (browser automation)
  • pillow (image processing)

Example: Web Scraping Script

import requests
from bs4 import BeautifulSoup

# Fetch page
response = requests.get('https://example.com')
soup = BeautifulSoup(response.text, 'html.parser')

# Extract data
articles = []
for article in soup.find_all('article'):
    title = article.find('h2').text
    link = article.find('a')['href']
    articles.append({'title': title, 'link': link})

# Save results
import json
with open('articles.json', 'w') as f:
    json.dump(articles, f, indent=2)

print(f"Extracted {len(articles)} articles")

Example: Data Processing

import pandas as pd

# Load data
df = pd.read_csv('data.csv')

# Process
df['total'] = df['price'] * df['quantity']
summary = df.groupby('category')['total'].sum()

# Save
summary.to_csv('summary.csv')
print(summary)

Install Additional Packages

pip install package-name

Installation

Prerequisites

  • Docker and Docker Compose
  • Python 3.11+
  • 4GB+ available RAM

Setup Mate

# Navigate to Mate directory
cd apps/Mate

# Copy environment template
cp .env.example .env.supabase

# Edit configuration
nano .env.supabase

# Start services
./setup/start.sh

Verify Installation

# Check Mate status
curl http://localhost:8000/health

# Expected response:
{"status": "ok", "version": "1.0.0"}

Docker Services

The start.sh script starts:

  • backend: FastAPI server (port 8000)
  • supabase-db: PostgreSQL (port 5432)
  • storage: Supabase Storage (port 5000)
  • valkey: Redis-compatible cache (port 6379)

Configuration

Mate Configuration

File: apps/Mate/.env.supabase

# Server
PORT=8000
HOST=0.0.0.0

# OpenAI API (for AI agent)
OPENAI_API_KEY=sk-proj-...

# Supabase
SUPABASE_URL=http://supabase-db:5432
SUPABASE_SERVICE_KEY=...

# Sandbox
SANDBOX_TIMEOUT_SECONDS=300
SANDBOX_MAX_MEMORY_MB=2048
SANDBOX_MAX_CPU_CORES=2

# VNC
VNC_DISPLAY=:99
VNC_RESOLUTION=1920x1080
VNC_PASSWORD=changeme

# Browser
BROWSER_HEADLESS=false
BROWSER_TIMEOUT_MS=30000

# Security
ALLOWED_DOMAINS=["example.com", "api.example.com"]
BLOCKED_COMMANDS=["rm -rf /", "shutdown"]

Resource Limits

# docker-compose-supabase.yml
services:
  backend:
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G
        reservations:
          cpus: '1'
          memory: 1G

Security Considerations

Sandbox Isolation

  • Each session runs in isolated Docker container
  • No access to host file system
  • Network restricted to allowed domains
  • Resource limits prevent DoS

Command Filtering

  • Blacklist dangerous commands
  • Whitelist allowed operations
  • No sudo/root access
  • Command logging for audit

Network Security

  • Outbound connections to allowed domains only
  • No inbound connections to sandbox
  • VNC password protected
  • Session timeout (5 minutes default)

Data Protection

  • Sandbox destroyed after session ends
  • No persistent storage between sessions
  • Sensitive data should not be stored in sandbox

Best Practices

  • Use unique API keys per environment
  • Limit session duration
  • Monitor resource usage
  • Review logs regularly
  • Keep Docker images updated

Mate API Reference

Create Session

Endpoint: POST /api/computer-use/session

{
  "user_id": "user_123"
}

Response:

{
  "session_id": "sess_abc",
  "vnc_url": "http://localhost:6080",
  "vnc_password": "pass123"
}

Send Message

Endpoint: POST /api/computer-use/chat

{
  "session_id": "sess_abc",
  "message": "Go to example.com and extract all links"
}

Response: SSE stream of agent actions and results

Get VNC Info

Endpoint: GET /api/computer-use/vnc-info

{
  "session_id": "sess_abc",
  "vnc_url": "http://localhost:6080",
  "display": ":99"
}

File Operations

Upload: POST /api/computer-use/filesystem/upload

Download: GET /api/computer-use/filesystem/download

List: GET /api/computer-use/filesystem/list

Execute Command

Endpoint: POST /api/computer-use/shell

{
  "session_id": "sess_abc",
  "command": "ls -la"
}

Destroy Session

Endpoint: DELETE /api/computer-use/session

{
  "session_id": "sess_abc"
}