生产就绪检查清单
生产就绪检查清单
检查清单
可靠性
- 健康检查配置(liveness、readiness、startup)
- 优雅终止处理
- 资源限制设置
- 自动伸缩配置
- 故障转移机制
- 数据备份策略
安全性
- Secret管理
- RBAC配置
- 网络策略
- 镜像安全扫描
- 依赖漏洞扫描
- 审计日志
可观测性
- 指标收集
- 日志配置
- 分布式追踪
- 告警规则
- 仪表板配置
性能
- 负载测试
- 性能基线
- 缓存策略
- 数据库优化
- CDN配置
运维
- CI/CD流水线
- 回滚策略
- 监控告警
- 文档完善
- Runbook
Kubernetes检查
# 生产就绪Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 3
selector:
matchLabels:
app: myapp
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: myapp
spec:
serviceAccountName: myapp
securityContext:
runAsNonRoot: true
runAsUser: 1000
containers:
- name: myapp
image: myapp:v1
ports:
- containerPort: 8080
envFrom:
- configMapRef:
name: myapp-config
- secretRef:
name: myapp-secrets
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
startupProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 30
periodSeconds: 10
自动化检查脚本
#!/bin/bash
echo "=== 生产就绪检查 ==="
# 1. 检查健康检查
echo "1. 检查健康检查..."
if kubectl get deployment myapp -o jsonpath='{.spec.template.spec.containers[0].livenessProbe}' | grep -q "httpGet"; then
echo " ✓ Liveness probe configured"
else
echo " ✗ Liveness probe missing"
fi
# 2. 检查资源限制
echo "2. 检查资源限制..."
if kubectl get deployment myapp -o jsonpath='{.spec.template.spec.containers[0].resources.limits}' | grep -q "cpu"; then
echo " ✓ Resource limits configured"
else
echo " ✗ Resource limits missing"
fi
# 3. 检查网络策略
echo "3. 检查网络策略..."
if kubectl get networkpolicy -n production | grep -q "myapp"; then
echo " ✓ Network policy configured"
else
echo " ✗ Network policy missing"
fi
# 4. 检查HPA
echo "4. 检查HPA..."
if kubectl get hpa -n production | grep -q "myapp"; then
echo " ✓ HPA configured"
else
echo " ✗ HPA missing"
fi
# 5. 检查PDB
echo "5. 检查PDB..."
if kubectl get pdb -n production | grep -q "myapp"; then
echo " ✓ PDB configured"
else
echo " ✗ PDB missing"
fi
上线流程
# 上线流程定义
steps:
- name: 代码审查
owner: development
checklist:
- 代码审查完成
- 单元测试通过
- 集成测试通过
- name: 安全审查
owner: security
checklist:
- 依赖漏洞扫描
- 镜像安全扫描
- 安全配置审查
- name: 性能测试
owner: sre
checklist:
- 负载测试完成
- 性能基线建立
- 容量评估完成
- name: 部署
owner: sre
checklist:
- 部署脚本测试
- 回滚方案准备
- 监控告警配置
- name: 验证
owner: sre
checklist:
- 功能验证
- 性能验证
- 监控验证
最佳实践
- 自动化检查
- 渐进式上线
- 监控先行
- 准备回滚
- 文档完善
总结
生产就绪检查是确保应用稳定上线的关键。通过系统性的检查清单和自动化工具,可以有效降低上线风险。