Prometheus监控系统
Prometheus监控系统
Prometheus简介
Prometheus是一个开源的监控和告警系统,具有多维数据模型和强大的查询语言。
架构
Prometheus Server
├── 数据采集 (Pull)
├── 时序数据库存储
├── PromQL查询
└── 告警规则
Exporters:
├── node-exporter (系统指标)
├── mysql-exporter (MySQL指标)
├── nginx-exporter (Nginx指标)
└── 自定义exporter
安装Prometheus
Docker部署
docker run -d --name prometheus \
-p 9090:9090 \
-v prometheus.yml:/etc/prometheus/prometheus.yml \
prom/prometheus
Docker Compose
version: '3.8'
services:
prometheus:
image: prom/prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
grafana:
image: grafana/grafana
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- grafana_data:/var/lib/grafana
node-exporter:
image: prom/node-exporter
ports:
- "9100:9100"
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
volumes:
prometheus_data:
grafana_data:
Prometheus配置
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "rules/*.yml"
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'mysql'
static_configs:
- targets: ['mysql-exporter:9104']
告警规则
# rules/alerts.yml
groups:
- name: system
rules:
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage is above 80% for 5 minutes"
- alert: HighMemoryUsage
expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage on {{ $labels.instance }}"
- alert: DiskSpaceLow
expr: (1 - node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 > 85
for: 10m
labels:
severity: critical
annotations:
summary: "Low disk space on {{ $labels.instance }}"
PromQL查询
# CPU使用率
100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# 内存使用率
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100
# 磁盘使用率
(1 - node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100
# 网络流量
irate(node_network_receive_bytes_total[5m]) * 8
Grafana仪表板
# 导入仪表板
# Node Exporter Full: ID 1860
# MySQL Overview: ID 7362
# Nginx: ID 12708
实践:完整监控系统
# 1. 启动监控系统
docker-compose up -d
# 2. 访问Prometheus
# http://localhost:9090
# 3. 访问Grafana
# http://localhost:3000
# 用户名: admin, 密码: admin
# 4. 添加数据源
# Prometheus: http://prometheus:9090
总结
Prometheus是云原生时代的标准监控系统。通过配置Prometheus、Grafana和告警规则,可以实现全面的系统监控。