← 返回首页
🧠

AWS LLM服务

📂 llm ⏱ 3 min 432 words

--- title: "AWS LLM服务" description: "全面介绍AWS平台上的大语言模型服务,包括Bedrock、SageMaker和Lambda集成方案" tags: ["AWS", "LLM服务", "Bedrock", "SageMaker", "Lambda"] category: "llm" icon: "🧠"

AWS LLM服务

Amazon Web Services(AWS)提供了完整的大语言模型(LLM)服务生态,从模型托管到推理部署再到应用集成,覆盖了LLM应用开发的全生命周期。本文将深入介绍AWS Bedrock、SageMaker和Lambda三大核心服务的LLM集成方案。

AWS Bedrock:托管式LLM服务

AWS Bedrock是全托管的基础模型服务,提供来自Anthropic、AI21 Labs、Meta等厂商的预训练模型API,无需管理基础设施即可使用LLM能力。

import boto3
import json

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

def call_claude(prompt, max_tokens=1024):
    body = json.dumps({
        "prompt": f"\n\nHuman: {prompt}\n\nAssistant:",
        "max_tokens_to_sample": max_tokens,
        "temperature": 0.7,
        "top_p": 0.9
    })
    
    response = bedrock.invoke_model(
        modelId="anthropic.claude-v2",
        contentType="application/json",
        accept="application/json",
        body=body
    )
    
    result = json.loads(response["body"].read())
    return result["completion"]

def call_titan_embedding(text):
    body = json.dumps({"inputText": text})
    
    response = bedrock.invoke_model(
        modelId="amazon.titan-embed-text-v1",
        contentType="application/json",
        accept="application/json",
        body=body
    )
    
    result = json.loads(response["body"].read())
    return result["embedding"]

Bedrock的优势在于统一的API接口,可以轻松切换不同厂商的模型。Titan Embedding模型则提供了文本向量化能力,支持构建RAG(检索增强生成)应用。

SageMaker LLM部署

SageMaker提供了完整的机器学习平台,支持自定义模型的训练、微调和部署。对于需要自定义模型的场景,SageMaker是理想选择。

import sagemaker
from sagemaker.huggingface import HuggingFaceModel

def deploy_custom_llm(model_s3_uri, role_arn):
    huggingface_model = HuggingFaceModel(
        model_data=model_s3_uri,
        role=role_arn,
        transformers_version="4.35.0",
        pytorch_version="2.1.0",
        py_version="py310",
        model_server_workers=2
    )
    
    predictor = huggingface_model.deploy(
        initial_instance_count=1,
        instance_type="ml.g5.2xlarge",
        endpoint_name="custom-llm-endpoint"
    )
    
    return predictor

def invoke_sagemaker_endpoint(endpoint_name, prompt):
    runtime = boto3.client("sagemaker-runtime")
    
    payload = json.dumps({
        "inputs": prompt,
        "parameters": {
            "max_new_tokens": 512,
            "temperature": 0.7,
            "top_p": 0.95
        }
    })
    
    response = runtime.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType="application/json",
        Body=payload
    )
    
    return json.loads(response["Body"].read())

SageMaker适合部署自定义微调的模型。通过HuggingFace容器,可以直接部署Transformers兼容的模型。ml.g5实例配备了NVIDIA A10G GPU,为LLM推理提供了良好的性价比。

Lambda无服务器LLM应用

AWS Lambda可以构建无服务器的LLM应用,实现按需调用、自动扩缩容。结合API Gateway,可以快速构建LLM API服务。

import json
import boto3
import os

bedrock = boto3.client("bedrock-runtime")

def lambda_handler(event, context):
    try:
        body = json.loads(event["body"])
        prompt = body.get("prompt", "")
        model = body.get("model", "anthropic.claude-v2")
        
        if not prompt:
            return {"statusCode": 400, "body": json.dumps({"error": "Prompt is required"})}
        
        bedrock_body = json.dumps({
            "prompt": f"\n\nHuman: {prompt}\n\nAssistant:",
            "max_tokens_to_sample": body.get("max_tokens", 1024),
            "temperature": body.get("temperature", 0.7)
        })
        
        response = bedrock.invoke_model(
            modelId=model,
            contentType="application/json",
            accept="application/json",
            body=bedrock_body
        )
        
        result = json.loads(response["body"].read())
        
        return {
            "statusCode": 200,
            "headers": {"Content-Type": "application/json"},
            "body": json.dumps({
                "response": result["completion"],
                "model": model
            })
        }
    
    except Exception as e:
        return {"statusCode": 500, "body": json.dumps({"error": str(e)})}

def stream_lambda_handler(event, context):
    body = json.loads(event["body"])
    
    bedrock_body = json.dumps({
        "prompt": f"\n\nHuman: {body['prompt']}\n\nAssistant:",
        "max_tokens_to_sample": body.get("max_tokens", 1024),
        "temperature": body.get("temperature", 0.7)
    })
    
    response = bedrock.invoke_model_with_response_stream(
        modelId="anthropic.claude-v2",
        contentType="application/json",
        accept="application/json",
        body=bedrock_body
    )
    
    for event in response["body"]:
        chunk = json.loads(event["chunk"]["bytes"])
        if "completion" in chunk:
            print(chunk["completion"], end="", flush=True)

Lambda函数提供了两种调用模式:同步调用适合简单的问答场景,流式调用适合需要实时输出的交互场景。配合API Gateway的CORS配置,可以轻松构建前端可用的LLM API。

Bedrock Agents:构建智能代理

Bedrock Agents是AWS提供的智能代理框架,支持LLM与工具调用的结合,可以执行复杂的多步骤任务。

import boto3

bedrock_agent = boto3.client("bedrock-agent-runtime")

def invoke_agent(agent_id, session_id, user_input):
    response = bedrock_agent.invoke_agent(
        agentId=agent_id,
        sessionId=session_id,
        inputText=user_input,
        enableTrace=True
    )
    
    result = ""
    for event in response["completion"]:
        if "chunk" in event:
            result += event["chunk"]["bytes"].decode()
    
    return result

def create_agent_with_tools():
    agent_config = {
        "agentName": "data-analyst-agent",
        "roleArn": "arn:aws:iam::role/bedrock-agent-role",
        "foundationModel": "anthropic.claude-v2",
        "instruction": "你是一个数据分析助手,可以查询数据库、生成图表和编写报告。",
        "agentResourceRoleArn": "arn:aws:iam::role/bedrock-agent-role",
        "actionGroups": [{
            "actionGroupName": "data-operations",
            "actionGroupExecutor": {
                "lambdaArn": "arn:aws:lambda:us-east-1:function/data-processor"
            },
            "functionSchema": {
                "functions": [{
                    "name": "query_database",
                    "description": "查询数据库获取数据",
                    "parameters": {
                        "query": {"description": "SQL查询语句", "required": True, "type": "string"}
                    }
                }]
            }
        }]
    }
    return agent_config

Bedrock Agents将LLM与外部工具连接,实现自主决策和任务执行。通过定义Action Group和Lambda函数,Agent可以执行数据库查询、API调用和文件操作等操作。

成本优化与监控

LLM服务的成本管理至关重要。AWS提供了多种工具来监控和优化LLM使用成本。

import boto3
from datetime import datetime, timedelta

cloudwatch = boto3.client("cloudwatch")

def monitor_bedrock_costs():
    end_time = datetime.utcnow()
    start_time = end_time - timedelta(days=1)
    
    response = cloudwatch.get_metric_statistics(
        Namespace="AWS/Bedrock",
        MetricName="Invocations",
        Dimensions=[{"Name": "ModelId", "Value": "anthropic.claude-v2"}],
        StartTime=start_time,
        EndTime=end_time,
        Period=3600,
        Statistics=["Sum"]
    )
    
    total_calls = sum(dp["Sum"] for dp in response["Datapoints"])
    estimated_cost = total_calls * 0.008
    
    if total_calls > 1000:
        send_alert(f"Bedrock调用量异常:{total_calls}次,预估成本${estimated_cost:.2f}")
    
    return {"total_calls": total_calls, "estimated_cost": estimated_cost}

def send_alert(message):
    sns = boto3.client("sns")
    sns.publish(
        TopicArn=os.environ["ALERT_TOPIC_ARN"],
        Message=message,
        Subject="LLM服务成本预警"
    )

通过CloudWatch监控Bedrock的调用指标,可以及时发现异常使用模式。结合SNS告警机制,可以在成本超出预算前及时干预。

总结

AWS提供了完整的LLM服务生态,从Bedrock的托管式API到SageMaker的自定义部署,再到Lambda的无服务器应用,覆盖了各种使用场景。开发者可以根据需求选择合适的服务组合,构建高效、可扩展的LLM应用。