← 返回首页
🔍

分布式追踪与可观测性

📂 devops ⏱ 2 min 309 words

分布式追踪与可观测性

可观测性三支柱

支柱 说明 工具
Metrics 数值指标 Prometheus
Logs 日志事件 ELK Stack
Traces 请求追踪 Jaeger/Zipkin

分布式追踪概念

核心概念

追踪数据

Trace ID: abc123
└── Span 1: API Gateway (100ms)
    ├── Span 2: Auth Service (30ms)
    └── Span 3: User Service (70ms)
        └── Span 4: Database Query (50ms)

OpenTelemetry

安装SDK

# Node.js
npm install @opentelemetry/sdk-node
npm install @opentelemetry/exporter-trace-otlp-http

# Python
pip install opentelemetry-sdk
pip install opentelemetry-exporter-otlp

Node.js配置

const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: 'http://jaeger:4318/v1/traces'
  }),
  instrumentations: [
    getNodeAutoInstrumentations()
  ]
});

sdk.start();

Python配置

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# 配置
provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="jaeger:4317"))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

# 使用
tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("my-operation") as span:
    span.set_attribute("user.id", "12345")
    # 业务逻辑

Jaeger

安装Jaeger

# docker-compose.yml
version: '3.8'

services:
  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"  # UI
      - "14268:14268"  # HTTP
      - "4317:4317"    # OTLP gRPC
      - "4318:4318"    # OTLP HTTP
    environment:
      - COLLECTOR_OTLP_ENABLED=true

  app:
    image: myapp
    environment:
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318

配置采样

# 配置文件
service:
  extensions: [jaeger_query]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [jaeger]

extensions:
  jaeger_query:
    num_traces: 1000

processors:
  batch:
    timeout: 1s
    send_batch_size: 100

实践:完整追踪系统

# docker-compose.yml
version: '3.8'

services:
  jaeger:
    image: jaegertracing/all-in-one
    ports:
      - "16686:16686"
      - "4317:4317"
      - "4318:4318"

  prometheus:
    image: prom/prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin

  app:
    build: .
    ports:
      - "8080:8080"
    environment:
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318
      - OTEL_SERVICE_NAME=myapp

Span属性

语义属性

// HTTP属性
span.setAttribute('http.method', 'GET');
span.setAttribute('http.url', '/api/users');
span.setAttribute('http.status_code', 200);
span.setAttribute('http.target', '/api/users');
span.setAttribute('http.host', 'api.example.com');

// 数据库属性
span.setAttribute('db.system', 'mysql');
span.setAttribute('db.statement', 'SELECT * FROM users WHERE id = ?');
span.setAttribute('db.user', 'root');

// 消息队列属性
span.setAttribute('messaging.system', 'kafka');
span.setAttribute('messaging.destination', 'user-events');

自定义属性

span.setAttribute('user.id', userId);
span.setAttribute('order.id', orderId);
span.setAttribute('feature.flag', 'new-checkout');

追踪上下文传播

// W3C Trace Context
const { context, propagation } = require('@opentelemetry/api');

// 注入上下文
const headers = {};
propagation.inject(context.active(), headers);

// 提取上下文
const extractedContext = propagation.extract(context.active(), headers);

最佳实践

  1. 为关键路径添加追踪
  2. 使用有意义的Span名称
  3. 添加相关业务属性
  4. 实现上下文传播
  5. 配置合理的采样率

总结

分布式追踪是微服务可观测性的关键。通过OpenTelemetry和Jaeger,可以实现跨服务的请求追踪和性能分析。