Jaeger分布式追踪:微服务链路追踪
Jaeger分布式追踪:微服务链路追踪
什么是Jaeger
Jaeger是Uber开源的分布式追踪系统,用于监控和诊断微服务架构中的请求链路。它基于OpenTracing标准,帮助开发人员理解服务间的调用关系、定位性能瓶颈。
架构组件
Jaeger架构:
├── Agent: 接收应用上报的追踪数据
├── Collector: 处理和存储追踪数据
├── Query: 查询和检索追踪数据
└── UI: 可视化追踪数据
安装Jaeger
Docker一键部署
# All-in-one模式(开发环境)
docker run -d \
--name jaeger \
-p 5775:5775/udp \
-p 6831:6831/udp \
-p 6832:6832/udp \
-p 5778:5778 \
-p 16686:16686 \
-p 14268:14268 \
-p 9411:9411 \
jaegertracing/all-in-one:latest
# 访问UI
# http://localhost:16686
生产环境部署
# 使用Docker Compose部署
cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
cassandra:
image: cassandra:4.0
environment:
- CASSANDRA_CLUSTER_NAME=jaeger
volumes:
- cassandra-data:/var/lib/cassandra
jaeger-collector:
image: jaegertracing/jaeger-collector:latest
environment:
- CASSANDRA_SERVERS=cassandra
- CASSANDRA_KEYSPACE=jaeger_v1
ports:
- "14269:14269"
depends_on:
- cassandra
jaeger-query:
image: jaegertracing/jaeger-query:latest
environment:
- CASSANDRA_SERVERS=cassandra
- CASSANDRA_KEYSPACE=jaeger_v1
ports:
- "16686:16686"
depends_on:
- cassandra
jaeger-agent:
image: jaegertracing/jaeger-agent:latest
environment:
- JAEGER_AGENT_HOST=jaeger-agent
ports:
- "6831:6831/udp"
- "6832:6832/udp"
- "5775:5775/udp"
EOF
应用集成
Go应用集成
package main
import (
"github.com/opentracing/opentracing-go"
"github.com/uber/jaeger-client-go"
"github.com/uber/jaeger-client-go/config"
)
func initTracer(serviceName string) (opentracing.Tracer, io.Closer, error) {
cfg := config.Configuration{
ServiceName: serviceName,
Sampler: &config.SamplerConfig{
Type: "const",
Param: 1,
},
Reporter: &config.ReporterConfig{
LogSpans: true,
LocalAgentHostPort: "jaeger-agent:6831",
},
}
return cfg.NewTracer(
config.Logger(jaeger.NullLogger),
)
}
func main() {
tracer, closer, _ := initTracer("my-service")
defer closer.Close()
opentracing.SetGlobalTracer(tracer)
// 创建根Span
span := opentracing.StartSpan("main-operation")
defer span.Finish()
// 添加标签
span.SetTag("user.id", "12345")
span.LogFields(
opentracing.String("event", "request-start"),
)
// 创建子Span
childSpan := opentracing.StartSpan(
"db-query",
opentracing.ChildOf(span.Context()),
)
defer childSpan.Finish()
// 执行数据库操作...
}
Python应用集成
import opentracing
from jaeger_client import Config
def init_tracer(service_name):
config = Config(
config={
'sampler': {
'type': 'const',
'param': 1,
},
'logging': True,
'local_agent': {
'reporting_host': 'jaeger-agent',
'reporting_port': 6831,
},
},
service_name=service_name,
)
return config.initialize_tracer()
tracer = init_tracer('my-python-service')
opentracing.set_global_tracer(tracer)
# 使用装饰器自动追踪
from opentracing_instrumentation import get_traced
@get_traced()
def process_request(request):
with opentracing.start_span('process-data') as span:
span.set_tag('request.id', request.id)
# 处理逻辑
return result
Node.js应用集成
const { initTracer } = require('jaeger-client');
const opentracing = require('opentracing');
const config = {
serviceName: 'my-node-service',
sampler: {
type: 'const',
param: 1,
},
reporter: {
logSpans: true,
agentHost: 'jaeger-agent',
agentPort: 6831,
},
};
const options = {
logger: {
info(msg) {
console.log('INFO:', msg);
},
error(msg) {
console.log('ERROR:', msg);
},
},
};
const tracer = initTracer(config, options);
opentracing.setGlobalTracer(tracer);
// Express中间件
function tracingMiddleware(req, res, next) {
const span = tracer.startSpan('http_request');
span.setTag('http.method', req.method);
span.setTag('http.url', req.url);
res.on('finish', () => {
span.setTag('http.status_code', res.statusCode);
span.finish();
});
next();
}
Kubernetes部署
# jaeger-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: jaeger
spec:
replicas: 1
selector:
matchLabels:
app: jaeger
template:
metadata:
labels:
app: jaeger
spec:
containers:
- name: jaeger
image: jaegertracing/all-in-one:latest
ports:
- containerPort: 16686
- containerPort: 6831
protocol: UDP
- containerPort: 5775
protocol: UDP
env:
- name: CASSANDRA_SERVERS
value: "cassandra"
- name: COLLECTOR_ZIPKIN_HOST_PORT
value: ":9411"
追踪分析
使用Query API
# 查询服务追踪
curl "http://localhost:16686/api/traces?service=my-service&limit=100"
# 按操作名查询
curl "http://localhost:16686/api/traces?service=my-service&operation=GET /api/users"
# 按标签查询
curl "http://localhost:16686/api/traces?service=my-service&tags={\"http.status_code\":\"500\"}"
# 获取特定追踪详情
curl "http://localhost:16686/api/traces/<trace-id>"
分析性能瓶颈
# 查找慢请求
curl "http://localhost:16686/api/traces?service=my-service&minDuration=100ms"
# 查找错误请求
curl "http://localhost:16686/api/traces?service=my-service&tags={\"error\":\"true\"}"
环境变量配置
# Agent配置
JAEGER_AGENT_HOST=jaeger-agent
JAEGER_AGENT_PORT=6831
JAEGER_SERVICE_NAME=my-service
JAEGER_SAMPLER_TYPE=const
JAEGER_SAMPLER_PARAM=1
JAEGER_REPORTER_LOG_SPANS=true
JAEGER_REPORTER_FLUSH_INTERVAL=1s
最佳实践
- 采样策略:生产环境使用概率采样,开发环境使用常量采样
- Span命名:使用有意义的操作名,避免高基数
- 标签使用:添加有价值的元数据标签
- 错误处理:记录错误详情和堆栈信息
- 性能考虑:异步上报追踪数据,避免阻塞主流程