Grafana仪表盘:多数据源与告警配置
Grafana仪表盘:多数据源与告警配置
Grafana架构概览
Grafana是开源的数据可视化平台,支持多种数据源,提供丰富的图表类型和告警功能。是可观测性体系的可视化核心。
Grafana架构:
┌─────────────────────────────────────────────────┐
│ Grafana Server │
├─────────────┬─────────────┬─────────────────────┤
│ Dashboard │ Panel │ Alerting │
│ 仪表盘 │ 面板 │ 告警引擎 │
├─────────────┴─────────────┴─────────────────────┤
│ Data Source API │
├───────┬───────┬───────┬───────┬─────────────────┤
│Prometh│Loki │Jaeger │MySQL │ Elasticsearch │
│ eus │ │ │ │ │
└───────┴───────┴───────┴───────┴─────────────────┘
多数据源配置
数据源配置示例
# provisioning/datasources/datasources.yml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
jsonData:
timeInterval: '15s'
httpMethod: POST
- name: Loki
type: loki
access: proxy
url: http://loki:3100
jsonData:
derivedFields:
- datasourceUid: jaeger
matcherRegex: "trace_id=(\\w+)"
name: TraceID
url: "$${__value.raw}"
- name: Jaeger
type: jaeger
access: proxy
url: http://jaeger:16686
- name: MySQL
type: mysql
url: mysql:3306
database: grafana
user: grafana
secureJsonData:
password: 'password'
仪表盘设计
仪表盘JSON模型
{
"dashboard": {
"title": "Service Overview",
"tags": ["microservice", "production"],
"timezone": "browser",
"refresh": "30s",
"time": {
"from": "now-1h",
"to": "now"
},
"templating": {
"list": [
{
"name": "service",
"type": "query",
"query": "label_values(http_requests_total, service)",
"refresh": 2,
"multi": true,
"includeAll": true
},
{
"name": "interval",
"type": "interval",
"query": "1m,5m,15m,30m,1h",
"auto": true,
"auto_min": "1m"
}
]
},
"panels": [
{
"title": "Request Rate",
"type": "timeseries",
"targets": [
{
"expr": "sum(rate(http_requests_total{service=\"$service\"}[$interval])) by (status)",
"legendFormat": "{{status}}"
}
],
"fieldConfig": {
"defaults": {
"unit": "reqps",
"custom": {
"drawStyle": "line",
"fillOpacity": 20
}
}
}
}
]
}
}
常用面板类型
panel_types:
timeseries:
description: "时序折线图"
use_case: "趋势分析、监控曲线"
features: ["多系列", "填充", "梯度"]
stat:
description: "单值统计"
use_case: "当前值、状态指示"
features: ["颜色映射", "阈值", "迷你图"]
gauge:
description: "仪表盘"
use_case: "百分比、进度"
features: ["阈值区间", "渐变色"]
bar:
description: "柱状图"
use_case: "对比分析"
features: ["堆叠", "水平/垂直"]
table:
description: "表格"
use_case: "详细数据"
features: ["排序", "过滤", "分页"]
heatmap:
description: "热力图"
use_case: "分布分析"
features: ["直方图", "颜色映射"]
Grafana告警配置
告警规则
# 告警规则配置
apiVersion: 1
groups:
- orgId: 1
name: Service Alerts
folder: Production
interval: 1m
rules:
- uid: high-error-rate
title: High Error Rate
condition: C
data:
- refId: A
datasourceUid: prometheus
model:
expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))
instant: true
- refId: B
datasourceUid: __expr__
model:
type: reduce
expression: A
reducer: last
- refId: C
datasourceUid: __expr__
model:
type: threshold
expression: B
conditions:
- evaluator:
type: gt
params: [0.05]
for: 5m
labels:
severity: critical
annotations:
summary: "错误率超过5%"
通知渠道
# 通知渠道配置
contactPoints:
- name: Slack
receivers:
- type: slack
settings:
url: https://hooks.slack.com/services/xxx
recipient: "#alerts"
title: |
[{{ len .Alerts.Firing }}] {{ .GroupLabels.alertname }}
text: |
{{ range .Alerts }}
*Instance:* {{ .Labels.instance }}
*Description:* {{ .Annotations.description }}
{{ end }}
- name: Webhook
receivers:
- type: webhook
settings:
url: http://alert-handler:8080/alert
httpMethod: POST
最佳实践
- 仪表盘分层:Overview → Service → Instance逐层下钻
- 变量驱动:使用模板变量实现仪表盘复用
- 一致配色:红色=错误,黄色=警告,绿色=正常
- 告警分级:Critical/Warning/Info分级,避免告警疲劳