微服务监控与可观测性
微服务监控与可观测性
概述
监控和可观测性是微服务架构的重要组成部分。本教程介绍日志、指标和追踪的实现。
1. Spring Boot Actuator
# application.yml
management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus
endpoint:
health:
show-details: always
metrics:
export:
prometheus:
enabled: true
2. Prometheus指标
import io.micrometer.core.annotation.Timed;
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import org.springframework.stereotype.Service;
@Service
public class MetricsService {
private final Counter requestCounter;
private final Counter errorCounter;
public MetricsService(MeterRegistry registry) {
this.requestCounter = Counter.builder("http.requests.total")
.description("Total HTTP requests")
.tag("service", "user-service")
.register(registry);
this.errorCounter = Counter.builder("http.errors.total")
.description("Total HTTP errors")
.tag("service", "user-service")
.register(registry);
}
@Timed(value = "user.service.get", description = "Time taken to get user")
public User getUser(Long id) {
requestCounter.increment();
// 业务逻辑
}
}
3. 实际应用示例
分布式追踪
import brave.Tracing;
import brave.sampler.Sampler;
import zipkin2.reporter.AsyncReporter;
import zipkin2.reporter.okhttp3.OkHttpSender;
@Configuration
public class TracingConfig {
@Bean
public Tracing tracing() {
OkHttpSender sender = OkHttpSender.create("http://localhost:9411/api/v2/spans");
AsyncReporter reporter = AsyncReporter.builder(sender).build();
return Tracing.newBuilder()
.localServiceName("user-service")
.spanReporter(reporter)
.sampler(Sampler.ALWAYS_SAMPLE)
.build();
}
}
告警配置
# prometheus.yml
groups:
- name: java-apps
rules:
- alert: HighErrorRate
expr: rate(http_errors_total{service="user-service"}[5m]) > 0.1
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"
description: "Error rate is {{ $value }} per second"
- alert: HighLatency
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "High latency detected"
description: "95th percentile latency is {{ $value }} seconds"
4. 最佳实践
- 使用标准指标:遵循Prometheus命名规范
- 设置告警规则:及时发现问题
- 分布式追踪:使用Jaeger或Zipkin
- 日志聚合:使用ELK或Loki
- 可视化监控:使用Grafana仪表板
总结
监控和可观测性是微服务架构的重要组成部分。掌握日志、指标和追踪的实现,可以构建可维护、可靠的微服务系统。