监控与告警系统

建立完整的监控和告警系统，包括指标收集、告警规则和通知渠道。

最后更新2026-03-11

AI 摘要

这页重点

核心结论

建立完整的监控和告警系统，包括指标收集、告警规则和通知渠道。

适用主题

运维管理

高频关键词

monitoring / alerting / metrics / operations

可信信号

最后更新 2026-03-11

监控与告警系统

本文介绍 OpenClaw 的监控和告警系统。

监控指标

系统指标

指标	说明
cpu_usage	CPU 使用率
memory_usage	内存使用率
disk_usage	磁盘使用率
network_io	网络 I/O

应用指标

指标	说明
requests_total	总请求数
requests_duration	请求响应时间
active_sessions	活跃会话数
errors_total	错误总数

Prometheus 配置

指标端点

# prometheus.yml
scrape_configs:
  - job_name: 'openclaw'
    static_configs:
      - targets: ['localhost:18789']
    metrics_path: '/metrics'

自定义指标

from prometheus_client import Counter, Histogram

requests_total = Counter(
    'openclaw_requests_total',
    'Total requests',
    ['channel', 'status']
)

request_duration = Histogram(
    'openclaw_request_duration_seconds',
    'Request duration',
    ['channel']
)

Grafana 仪表板

仪表板配置

{
  "dashboard": {
    "title": "OpenClaw 监控",
    "panels": [
      {
        "title": "请求速率",
        "targets": [
          {
            "expr": "rate(openclaw_requests_total[5m])"
          }
        ]
      },
      {
        "title": "响应时间 P95",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, openclaw_request_duration_bucket)"
          }
        ]
      }
    ]
  }
}

告警配置

告警规则

groups:
  - name: openclaw
    rules:
      - alert: HighErrorRate
        expr: rate(openclaw_errors_total[5m]) > 0.1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "错误率过高"
          
      - alert: HighMemory
        expr: openclaw_memory_usage > 0.9
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "内存使用率过高"

通知渠道

route:
  receiver: 'default'
  routes:
    - match:
        severity: critical
      receiver: 'critical'

receivers:
  - name: 'default'
    email:
      to: 'admin@example.com'
      
  - name: 'critical'
    slack:
      webhook_url: '${SLACK_WEBHOOK}'

监控与告警系统

这页重点

监控与告警系统

监控指标

系统指标

应用指标

Prometheus 配置

指标端点

自定义指标

Grafana 仪表板

仪表板配置

告警配置

告警规则

通知渠道

下一步

把零散经验接成稳定方法

同主题、同路径、同阶段