BClawPool Agent · Pool Management Console

🖥 Tenants Management

Hosts (Group)
shared skill(s) · scope per tenant with Skill Groups Manage →
Tenants
ID Name Status Group Tags vCPU Memory Disk Guest IP Port Rootfs VM Health Gateway Actions

⚙ Agent Configuration

Config Templates

OpenClaw configuration templates for different LLM providers and models.

No custom templates. Tenants use the built-in default config.

MCP Tools (via AgentCore Gateway)

All tenant VMs auto-connect to the AgentCore Gateway and gain these MCP tools. Tool definitions live in deploy/lambda/agentcore_tools/ + deploy/stack.py.

AgentCore not enabled, or no tools registered yet. Set agentcore.enabled: true in config.yml and redeploy.

Skill Groups

Groups bundle skills together so a tenant can subscribe via group: "team-sre" instead of listing every skill. A tenant's effective skill set = tenant.skills ∪ group.skills. Tenants without scoping fields get every skill (legacy broadcast).

No groups defined. Click + New Group to bundle skills for a team.

Shared Skills

Skills are shared across all tenants. They're plain markdown files in s3://${ASSETS_BUCKET}/skills/<name>/SKILL.md and are synced to every host every 5 minutes via cron, then injected into VMs at launch. Click a row to view / edit. Use Groups (above) to scope skills to specific tenants.

No skills configured. Click + New Skill above (or upload to s3://${ASSETS_BUCKET}/skills/<name>/ directly).

Observability

Status

Prometheus (AMP)
Grafana (AMG) Workspace
SNS Notifications
💡 To enable: set metrics.enabled: true in config.yml and redeploy. The stack will provision Amazon Managed Prometheus + Grafana, and ADOT collectors on each host start scraping in ~3 minutes after rollout.

Grafana Data Source

In Grafana → Connections → Data sources → Add data source → Prometheus, fill in the values below. The Grafana workspace's IAM role already has read access to AMP, so no static keys are needed.

Prometheus server URL
SigV4 auth On
Authentication Provider AWS SDK Default — "Workspace IAM Role" or "Access & secret key" return 403
Default Region
⚠️ Use the server URL above (workspace root) — not the remote_write URL. Grafana appends /api/v1/query itself.
ADOT remote_write:

Per-VM Metrics

Each host's host-agent exposes these gauges on :8899/metrics. ADOT scrapes every 30s and remote-writes to AMP via SigV4 (no static credentials).

Metric Type Labels Description
openclaw_vm_health gauge (0/1) tenant 1 if VM responded to ping, else 0
openclaw_vm_cpu_pct gauge tenant Per-VM CPU usage (percent of allocated vcpus)
openclaw_vm_memory_used_mb gauge tenant Per-VM memory in active use (MB, from VmRSS)
openclaw_vm_memory_balloon_mib gauge tenant Balloon size held by the host (MiB)
openclaw_vm_disk_used_mb gauge tenant Per-VM data disk used (MB)
openclaw_vm_disk_total_mb gauge tenant Per-VM data disk capacity (MB)
openclaw_vm_disk_used_pct gauge tenant Per-VM data disk used (percent)

Sample PromQL

Copy into Grafana → Explore → AMP datasource.

Memory used by all running VMs of a tenant
sum by (tenant) (openclaw_vm_memory_used_mb)
Hosts with at least one unhealthy VM in the last minute
min_over_time(openclaw_vm_health[1m]) == 0
Tenants over 90% disk usage
openclaw_vm_disk_used_pct > 90

💾 Backups

Tenant Source Status Backup Time Size Actions

Backups are retained for 7 days (S3 lifecycle). Orphan backups are from tenants that have been deleted — restoring creates a new tenant with the backup's data volume.

🛠 Pool Ops — 集群调度中枢

对整个 openclaw pool 的批量调度行为集中在此,全部走真实控制面端点(无 mock): 批量生命周期(POST /batch/tenants)、滚动重建镜像(POST /hosts/refresh-rootfs)、容量账对账。需 operator/admin 角色。

批量生命周期

镜像滚动重建

拉取最新黄金镜像(rootfs+data template)到所有 host,新建/重建节点继承最新镜像。

容量账对账

账面 vm_count vs 真实 running/creating。卡 creating 超 15min 的僵尸由 health_check reaper 自动回收容量。

🔥 Load Test — 控制面端到端压测

并发触发 N 个 POST /tenants,真实测控制面注册 API 的 p50/p99 和 creating→running 可用时延。用于验证「N 个同时启动多久能用」+ 暴露 host 容量/超卖/调度争用瓶颈。压测节点名前缀 lt-,跑完一键清理。

POST 成功/总数
POST p50
POST p99
POST max
全部 running 耗时
running / 成功
失败

    

🔧 Settings

API Connection

Infrastructure

Optional features and their current status. Toggle in config.yml and re-run ./setup.sh.

Multi-AZ HA
Prometheus + Grafana
AWS WAF
Console Login (Cognito)
RBAC (role-gating)
SNS Lifecycle Events
Per-tenant Quotas
AgentCore

Host Overcommit Ratios

Allocatable resources = physical × ratio. Tune in config.yml under host:.

CPU overcommit:
Memory overcommit:
Default per tenant:

Fleet by AZ

Live distribution of registered hosts and their tenants across Availability Zones. Set multi_az.enabled: true in config.yml to spread the ASG.

Availability Zone Hosts VMs vCPU used / total

System

Site URL:
GitHub: aws-samples/sample-multi-tenant-openclaw-on-firecracker