🖥 Tenants Management
| ID | Name | Status | Group | Tags | vCPU | Memory | Disk | Guest IP | Port | Rootfs | VM Health | Gateway | Actions |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| No tenants | |||||||||||||
| — |
|
|
|
||||||||||
| ID | Name | Status | Group | Tags | vCPU | Memory | Disk | Guest IP | Port | Rootfs | VM Health | Gateway | Actions |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| No tenants | |||||||||||||
| — |
|
|
|
||||||||||
OpenClaw configuration templates for different LLM providers and models.
All tenant VMs auto-connect to the AgentCore Gateway and gain these
MCP tools. Tool definitions live in
deploy/lambda/agentcore_tools/ +
deploy/stack.py.
agentcore.enabled: true in config.yml and
redeploy.
Groups bundle skills together so a tenant can subscribe via
group: "team-sre"
instead of listing every skill. A tenant's effective skill set =
tenant.skills ∪ group.skills. Tenants without scoping
fields get every skill (legacy broadcast).
Skills are shared across all tenants. They're plain markdown files in
s3://${ASSETS_BUCKET}/skills/<name>/SKILL.md and
are synced to every host every 5 minutes via cron, then injected into
VMs at launch.
Click a row to view / edit. Use Groups (above) to scope skills to
specific tenants.
s3://${ASSETS_BUCKET}/skills/<name>/ directly).
metrics.enabled: true in
config.yml and redeploy. The stack will provision Amazon
Managed Prometheus + Grafana, and ADOT collectors on each host start
scraping in ~3 minutes after rollout.
In Grafana → Connections → Data sources → Add data source → Prometheus, fill in the values below. The Grafana workspace's IAM role already has read access to AMP, so no static keys are needed.
Prometheus server URL
|
|
SigV4 auth |
On |
Authentication Provider
|
AWS SDK Default — "Workspace IAM Role" or "Access & secret key" return 403 |
Default Region |
remote_write URL. Grafana appends
/api/v1/query itself.
Each host's host-agent exposes these gauges on
:8899/metrics. ADOT scrapes every 30s and remote-writes
to AMP via SigV4 (no static credentials).
| Metric | Type | Labels | Description |
|---|---|---|---|
openclaw_vm_health |
gauge (0/1) | tenant | 1 if VM responded to ping, else 0 |
openclaw_vm_cpu_pct |
gauge | tenant | Per-VM CPU usage (percent of allocated vcpus) |
openclaw_vm_memory_used_mb |
gauge | tenant | Per-VM memory in active use (MB, from VmRSS) |
openclaw_vm_memory_balloon_mib |
gauge | tenant | Balloon size held by the host (MiB) |
openclaw_vm_disk_used_mb |
gauge | tenant | Per-VM data disk used (MB) |
openclaw_vm_disk_total_mb |
gauge | tenant | Per-VM data disk capacity (MB) |
openclaw_vm_disk_used_pct |
gauge | tenant | Per-VM data disk used (percent) |
Copy into Grafana → Explore → AMP datasource.
sum by (tenant) (openclaw_vm_memory_used_mb)
min_over_time(openclaw_vm_health[1m]) == 0
openclaw_vm_disk_used_pct > 90
| Tenant | Source Status | Backup Time | Size | Actions |
|---|---|---|---|---|
| ↳ previous backup | ||||
Backups are retained for 7 days (S3 lifecycle). Orphan backups are from tenants that have been deleted — restoring creates a new tenant with the backup's data volume.
对整个 openclaw pool 的批量调度行为集中在此,全部走真实控制面端点(无
mock): 批量生命周期(POST /batch/tenants)、滚动重建镜像(POST /hosts/refresh-rootfs)、容量账对账。需 operator/admin 角色。
拉取最新黄金镜像(rootfs+data template)到所有 host,新建/重建节点继承最新镜像。
账面 vm_count vs 真实 running/creating。卡 creating 超 15min 的僵尸由 health_check reaper 自动回收容量。
并发触发 N 个 POST /tenants,真实测控制面注册 API 的 p50/p99
和 creating→running 可用时延。用于验证「N 个同时启动多久能用」+ 暴露
host 容量/超卖/调度争用瓶颈。压测节点名前缀
lt-,跑完一键清理。
Optional features and their current status. Toggle in
config.yml and re-run ./setup.sh.
Allocatable resources = physical × ratio. Tune in
config.yml under host:.
Live distribution of registered hosts and their tenants across
Availability Zones. Set multi_az.enabled: true in
config.yml to spread the ASG.
| Availability Zone | Hosts | VMs | vCPU used / total |
|---|---|---|---|