分布式追踪与可观测性
分布式追踪与可观测性
可观测性三支柱
| 支柱 | 说明 | 工具 |
|---|---|---|
| Metrics | 数值指标 | Prometheus |
| Logs | 日志事件 | ELK Stack |
| Traces | 请求追踪 | Jaeger/Zipkin |
分布式追踪概念
核心概念
- Trace:一次请求的完整追踪
- Span:追踪中的一个操作单元
- SpanContext:跨服务传播的上下文
追踪数据
Trace ID: abc123
└── Span 1: API Gateway (100ms)
├── Span 2: Auth Service (30ms)
└── Span 3: User Service (70ms)
└── Span 4: Database Query (50ms)
OpenTelemetry
安装SDK
# Node.js
npm install @opentelemetry/sdk-node
npm install @opentelemetry/exporter-trace-otlp-http
# Python
pip install opentelemetry-sdk
pip install opentelemetry-exporter-otlp
Node.js配置
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({
url: 'http://jaeger:4318/v1/traces'
}),
instrumentations: [
getNodeAutoInstrumentations()
]
});
sdk.start();
Python配置
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
# 配置
provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="jaeger:4317"))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
# 使用
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("my-operation") as span:
span.set_attribute("user.id", "12345")
# 业务逻辑
Jaeger
安装Jaeger
# docker-compose.yml
version: '3.8'
services:
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "16686:16686" # UI
- "14268:14268" # HTTP
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
environment:
- COLLECTOR_OTLP_ENABLED=true
app:
image: myapp
environment:
- OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318
配置采样
# 配置文件
service:
extensions: [jaeger_query]
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [jaeger]
extensions:
jaeger_query:
num_traces: 1000
processors:
batch:
timeout: 1s
send_batch_size: 100
实践:完整追踪系统
# docker-compose.yml
version: '3.8'
services:
jaeger:
image: jaegertracing/all-in-one
ports:
- "16686:16686"
- "4317:4317"
- "4318:4318"
prometheus:
image: prom/prometheus
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
grafana:
image: grafana/grafana
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
app:
build: .
ports:
- "8080:8080"
environment:
- OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318
- OTEL_SERVICE_NAME=myapp
Span属性
语义属性
// HTTP属性
span.setAttribute('http.method', 'GET');
span.setAttribute('http.url', '/api/users');
span.setAttribute('http.status_code', 200);
span.setAttribute('http.target', '/api/users');
span.setAttribute('http.host', 'api.example.com');
// 数据库属性
span.setAttribute('db.system', 'mysql');
span.setAttribute('db.statement', 'SELECT * FROM users WHERE id = ?');
span.setAttribute('db.user', 'root');
// 消息队列属性
span.setAttribute('messaging.system', 'kafka');
span.setAttribute('messaging.destination', 'user-events');
自定义属性
span.setAttribute('user.id', userId);
span.setAttribute('order.id', orderId);
span.setAttribute('feature.flag', 'new-checkout');
追踪上下文传播
// W3C Trace Context
const { context, propagation } = require('@opentelemetry/api');
// 注入上下文
const headers = {};
propagation.inject(context.active(), headers);
// 提取上下文
const extractedContext = propagation.extract(context.active(), headers);
最佳实践
- 为关键路径添加追踪
- 使用有意义的Span名称
- 添加相关业务属性
- 实现上下文传播
- 配置合理的采样率
总结
分布式追踪是微服务可观测性的关键。通过OpenTelemetry和Jaeger,可以实现跨服务的请求追踪和性能分析。