AWS LLM服务
--- title: "AWS LLM服务" description: "全面介绍AWS平台上的大语言模型服务,包括Bedrock、SageMaker和Lambda集成方案" tags: ["AWS", "LLM服务", "Bedrock", "SageMaker", "Lambda"] category: "llm" icon: "🧠"
AWS LLM服务
Amazon Web Services(AWS)提供了完整的大语言模型(LLM)服务生态,从模型托管到推理部署再到应用集成,覆盖了LLM应用开发的全生命周期。本文将深入介绍AWS Bedrock、SageMaker和Lambda三大核心服务的LLM集成方案。
AWS Bedrock:托管式LLM服务
AWS Bedrock是全托管的基础模型服务,提供来自Anthropic、AI21 Labs、Meta等厂商的预训练模型API,无需管理基础设施即可使用LLM能力。
import boto3
import json
bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
def call_claude(prompt, max_tokens=1024):
body = json.dumps({
"prompt": f"\n\nHuman: {prompt}\n\nAssistant:",
"max_tokens_to_sample": max_tokens,
"temperature": 0.7,
"top_p": 0.9
})
response = bedrock.invoke_model(
modelId="anthropic.claude-v2",
contentType="application/json",
accept="application/json",
body=body
)
result = json.loads(response["body"].read())
return result["completion"]
def call_titan_embedding(text):
body = json.dumps({"inputText": text})
response = bedrock.invoke_model(
modelId="amazon.titan-embed-text-v1",
contentType="application/json",
accept="application/json",
body=body
)
result = json.loads(response["body"].read())
return result["embedding"]
Bedrock的优势在于统一的API接口,可以轻松切换不同厂商的模型。Titan Embedding模型则提供了文本向量化能力,支持构建RAG(检索增强生成)应用。
SageMaker LLM部署
SageMaker提供了完整的机器学习平台,支持自定义模型的训练、微调和部署。对于需要自定义模型的场景,SageMaker是理想选择。
import sagemaker
from sagemaker.huggingface import HuggingFaceModel
def deploy_custom_llm(model_s3_uri, role_arn):
huggingface_model = HuggingFaceModel(
model_data=model_s3_uri,
role=role_arn,
transformers_version="4.35.0",
pytorch_version="2.1.0",
py_version="py310",
model_server_workers=2
)
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
endpoint_name="custom-llm-endpoint"
)
return predictor
def invoke_sagemaker_endpoint(endpoint_name, prompt):
runtime = boto3.client("sagemaker-runtime")
payload = json.dumps({
"inputs": prompt,
"parameters": {
"max_new_tokens": 512,
"temperature": 0.7,
"top_p": 0.95
}
})
response = runtime.invoke_endpoint(
EndpointName=endpoint_name,
ContentType="application/json",
Body=payload
)
return json.loads(response["Body"].read())
SageMaker适合部署自定义微调的模型。通过HuggingFace容器,可以直接部署Transformers兼容的模型。ml.g5实例配备了NVIDIA A10G GPU,为LLM推理提供了良好的性价比。
Lambda无服务器LLM应用
AWS Lambda可以构建无服务器的LLM应用,实现按需调用、自动扩缩容。结合API Gateway,可以快速构建LLM API服务。
import json
import boto3
import os
bedrock = boto3.client("bedrock-runtime")
def lambda_handler(event, context):
try:
body = json.loads(event["body"])
prompt = body.get("prompt", "")
model = body.get("model", "anthropic.claude-v2")
if not prompt:
return {"statusCode": 400, "body": json.dumps({"error": "Prompt is required"})}
bedrock_body = json.dumps({
"prompt": f"\n\nHuman: {prompt}\n\nAssistant:",
"max_tokens_to_sample": body.get("max_tokens", 1024),
"temperature": body.get("temperature", 0.7)
})
response = bedrock.invoke_model(
modelId=model,
contentType="application/json",
accept="application/json",
body=bedrock_body
)
result = json.loads(response["body"].read())
return {
"statusCode": 200,
"headers": {"Content-Type": "application/json"},
"body": json.dumps({
"response": result["completion"],
"model": model
})
}
except Exception as e:
return {"statusCode": 500, "body": json.dumps({"error": str(e)})}
def stream_lambda_handler(event, context):
body = json.loads(event["body"])
bedrock_body = json.dumps({
"prompt": f"\n\nHuman: {body['prompt']}\n\nAssistant:",
"max_tokens_to_sample": body.get("max_tokens", 1024),
"temperature": body.get("temperature", 0.7)
})
response = bedrock.invoke_model_with_response_stream(
modelId="anthropic.claude-v2",
contentType="application/json",
accept="application/json",
body=bedrock_body
)
for event in response["body"]:
chunk = json.loads(event["chunk"]["bytes"])
if "completion" in chunk:
print(chunk["completion"], end="", flush=True)
Lambda函数提供了两种调用模式:同步调用适合简单的问答场景,流式调用适合需要实时输出的交互场景。配合API Gateway的CORS配置,可以轻松构建前端可用的LLM API。
Bedrock Agents:构建智能代理
Bedrock Agents是AWS提供的智能代理框架,支持LLM与工具调用的结合,可以执行复杂的多步骤任务。
import boto3
bedrock_agent = boto3.client("bedrock-agent-runtime")
def invoke_agent(agent_id, session_id, user_input):
response = bedrock_agent.invoke_agent(
agentId=agent_id,
sessionId=session_id,
inputText=user_input,
enableTrace=True
)
result = ""
for event in response["completion"]:
if "chunk" in event:
result += event["chunk"]["bytes"].decode()
return result
def create_agent_with_tools():
agent_config = {
"agentName": "data-analyst-agent",
"roleArn": "arn:aws:iam::role/bedrock-agent-role",
"foundationModel": "anthropic.claude-v2",
"instruction": "你是一个数据分析助手,可以查询数据库、生成图表和编写报告。",
"agentResourceRoleArn": "arn:aws:iam::role/bedrock-agent-role",
"actionGroups": [{
"actionGroupName": "data-operations",
"actionGroupExecutor": {
"lambdaArn": "arn:aws:lambda:us-east-1:function/data-processor"
},
"functionSchema": {
"functions": [{
"name": "query_database",
"description": "查询数据库获取数据",
"parameters": {
"query": {"description": "SQL查询语句", "required": True, "type": "string"}
}
}]
}
}]
}
return agent_config
Bedrock Agents将LLM与外部工具连接,实现自主决策和任务执行。通过定义Action Group和Lambda函数,Agent可以执行数据库查询、API调用和文件操作等操作。
成本优化与监控
LLM服务的成本管理至关重要。AWS提供了多种工具来监控和优化LLM使用成本。
import boto3
from datetime import datetime, timedelta
cloudwatch = boto3.client("cloudwatch")
def monitor_bedrock_costs():
end_time = datetime.utcnow()
start_time = end_time - timedelta(days=1)
response = cloudwatch.get_metric_statistics(
Namespace="AWS/Bedrock",
MetricName="Invocations",
Dimensions=[{"Name": "ModelId", "Value": "anthropic.claude-v2"}],
StartTime=start_time,
EndTime=end_time,
Period=3600,
Statistics=["Sum"]
)
total_calls = sum(dp["Sum"] for dp in response["Datapoints"])
estimated_cost = total_calls * 0.008
if total_calls > 1000:
send_alert(f"Bedrock调用量异常:{total_calls}次,预估成本${estimated_cost:.2f}")
return {"total_calls": total_calls, "estimated_cost": estimated_cost}
def send_alert(message):
sns = boto3.client("sns")
sns.publish(
TopicArn=os.environ["ALERT_TOPIC_ARN"],
Message=message,
Subject="LLM服务成本预警"
)
通过CloudWatch监控Bedrock的调用指标,可以及时发现异常使用模式。结合SNS告警机制,可以在成本超出预算前及时干预。
总结
AWS提供了完整的LLM服务生态,从Bedrock的托管式API到SageMaker的自定义部署,再到Lambda的无服务器应用,覆盖了各种使用场景。开发者可以根据需求选择合适的服务组合,构建高效、可扩展的LLM应用。