LLM日志系统
--- title: "LLM日志系统" description: "深入讲解大语言模型日志系统的设计与实现,涵盖结构化日志、ELK Stack部署、日志分析与管理的完整方案。" tags: ["LLM日志", "结构化日志", "ELK Stack", "日志管理"] category: "llm" icon: "🧠"
LLM日志系统
LLM日志的特殊性
大语言模型系统的日志与传统Web应用有显著不同。LLM日志需要记录的不仅是请求和响应,还包括Prompt内容、模型参数、生成的Token序列以及质量评估结果。这些日志数据量大、结构复杂,需要专门的设计来有效管理。
关键日志类型
- 请求日志:用户输入、系统Prompt、模型配置
- 响应日志:生成内容、Token使用量、停止原因
- 质量日志:人工标注、自动评分、用户反馈
- 性能日志:延迟分解、资源消耗、队列等待
结构化日志设计
日志Schema定义
from pydantic import BaseModel
from typing import Optional, List
from datetime import datetime
from enum import Enum
class LogLevel(str, Enum):
INFO = "info"
WARNING = "warning"
ERROR = "error"
class LLMLogEntry(BaseModel):
timestamp: datetime
request_id: str
user_id: Optional[str]
model_name: str
prompt: str
response: Optional[str]
system_prompt: Optional[str]
temperature: float
max_tokens: int
input_tokens: int
output_tokens: int
latency_ms: float
first_token_latency_ms: Optional[float]
stop_reason: str
level: LogLevel
metadata: dict = {}
class QualityLogEntry(BaseModel):
request_id: str
rating: Optional[int]
feedback: Optional[str]
annotations: List[str] = []
reviewer_id: Optional[str]
日志记录器实现
import json
import logging
import uuid
from datetime import datetime
class StructuredLogger:
def __init__(self, service_name: str):
self.service_name = service_name
self.logger = logging.getLogger(service_name)
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter('%(message)s'))
self.logger.addHandler(handler)
self.logger.setLevel(logging.INFO)
def log_request(self, entry: LLMLogEntry):
log_record = {
"timestamp": entry.timestamp.isoformat(),
"service": self.service_name,
"level": entry.level.value,
"request_id": entry.request_id,
"model": entry.model_name,
"metrics": {
"input_tokens": entry.input_tokens,
"output_tokens": entry.output_tokens,
"latency_ms": entry.latency_ms,
"ttft_ms": entry.first_token_latency_ms
},
"config": {
"temperature": entry.temperature,
"max_tokens": entry.max_tokens
},
"prompt": entry.prompt[:1000], # 截断避免日志过大
"stop_reason": entry.stop_reason
}
self.logger.info(json.dumps(log_record, ensure_ascii=False))
# 使用示例
logger = StructuredLogger("llm-inference")
entry = LLMLogEntry(
timestamp=datetime.now(),
request_id=str(uuid.uuid4()),
user_id="user_123",
model_name="gpt-4",
prompt="解释机器学习的基本概念",
response="机器学习是人工智能的一个子领域...",
system_prompt="你是一个AI助手",
temperature=0.7,
max_tokens=2048,
input_tokens=15,
output_tokens=256,
latency_ms=2340.5,
first_token_latency_ms=180.2,
stop_reason="stop",
level=LogLevel.INFO
)
logger.log_request(entry)
ELK Stack架构
架构组件
用户请求 → LLM服务 → Logstash → Elasticsearch → Kibana
(解析/转换) (存储/索引) (可视化)
Logstash配置
# logstash.conf
input {
beats {
port => 5044
}
}
filter {
json {
source => "message"
target => "llm"
}
date {
match => ["llm.timestamp", "ISO8601"]
target => "@timestamp"
}
mutate {
add_field => {
"service" => "%{llm.service}"
}
remove_field => ["llm.prompt"] # 敏感信息过滤
}
# 提取关键指标
ruby {
code => "
metrics = event.get('[llm][metrics]')
if metrics
event.set('input_tokens', metrics.get('input_tokens'))
event.set('output_tokens', metrics.get('output_tokens'))
event.set('latency_ms', metrics.get('latency_ms'))
end
"
}
}
output {
elasticsearch {
hosts => ["http://elasticsearch:9200"]
index => "llm-logs-%{+YYYY.MM.dd}"
template_name => "llm-logs"
}
}
Elasticsearch索引模板
{
"index_patterns": ["llm-logs-*"],
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"refresh_interval": "5s"
},
"mappings": {
"properties": {
"timestamp": {"type": "date"},
"request_id": {"type": "keyword"},
"model": {"type": "keyword"},
"input_tokens": {"type": "integer"},
"output_tokens": {"type": "integer"},
"latency_ms": {"type": "float"},
"stop_reason": {"type": "keyword"},
"level": {"type": "keyword"}
}
}
}
日志分析与查询
Kibana查询示例
// 查找高延迟请求
{
"query": {
"bool": {
"must": [
{"range": {"latency_ms": {"gte": 5000}}},
{"term": {"level": "info"}}
]
}
},
"aggs": {
"avg_latency_by_model": {
"terms": {"field": "model"},
"aggs": {
"avg_latency": {"avg": {"field": "latency_ms"}}
}
}
}
}
日志告警规则
# 基于日志的告警
- alert: HighErrorRate
condition: count(errors) / count(all) > 0.05
window: 5m
severity: critical
- alert: SlowInference
condition: percentile(latency_ms, 95) > 10000
window: 10m
severity: warning
日志管理最佳实践
- 隐私保护:对用户输入进行脱敏处理,移除PII信息
- 采样策略:对成功请求采样记录,错误请求全量记录
- 生命周期:设置日志保留策略,热数据30天,温数据90天,冷数据归档
- 压缩存储:使用Gzip压缩日志文件,减少存储开销
- 实时分析:关键指标使用流式计算实时聚合
完善的LLM日志系统是问题排查、性能优化和质量保证的基础,值得投入足够的工程资源来建设。