Building serverless generative AI applications allows you to scale automatically, pay only for what you use, and focus on business logic rather than infrastructure management.
Serverless Architecture for GenAI
flowchart LR
subgraph Client["Client Layer"]
A["Web/Mobile App"]
end
subgraph API["API Layer"]
B["API Gateway"]
C["Lambda Authorizer"]
end
subgraph Compute["Compute Layer"]
D["Lambda Function"]
E["Step Functions"]
end
subgraph AI["AI Layer"]
F["Amazon Bedrock"]
G["Knowledge Bases"]
end
A --> B
B --> C
C --> D
D --> F
D --> G
E --> D
style F fill:#8b5cf6,color:#fff
style D fill:#f59e0b,color:#fff
Lambda Integration with Bedrock
Basic Lambda Function
import json
import boto3
bedrock = boto3.client('bedrock-runtime')
def lambda_handler(event, context):
body = json.loads(event.get('body', '{}'))
prompt = body.get('prompt', '')
if not prompt:
return {
'statusCode': 400,
'body': json.dumps({'error': 'Prompt is required'})
}
response = bedrock.converse(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
messages=[{'role': 'user', 'content': [{'text': prompt}]}],
inferenceConfig={'maxTokens': 1024, 'temperature': 0.7}
)
result = response['output']['message']['content'][0]['text']
return {
'statusCode': 200,
'headers': {'Content-Type': 'application/json'},
'body': json.dumps({'response': result})
}
Lambda Configuration
| Setting |
Recommended Value |
Reason |
| Memory |
512-1024 MB |
Sufficient for API calls |
| Timeout |
30-60 seconds |
Allow for model inference |
| Runtime |
Python 3.12 |
Latest boto3 support |
IAM Policy for Lambda
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": "arn:aws:bedrock:*::foundation-model/*"
},
{
"Effect": "Allow",
"Action": [
"bedrock:Retrieve",
"bedrock:RetrieveAndGenerate"
],
"Resource": "arn:aws:bedrock:*:*:knowledge-base/*"
}
]
}
API Gateway Integration
REST API Setup
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Resources:
GenAIApi:
Type: AWS::Serverless::Api
Properties:
StageName: prod
Auth:
ApiKeyRequired: true
ChatFunction:
Type: AWS::Serverless::Function
Properties:
Handler: app.lambda_handler
Runtime: python3.12
MemorySize: 1024
Timeout: 60
Policies:
- Statement:
- Effect: Allow
Action:
- bedrock:InvokeModel
Resource: '*'
Events:
ChatApi:
Type: Api
Properties:
RestApiId: !Ref GenAIApi
Path: /chat
Method: post
Streaming with Lambda Function URLs
import json
import boto3
bedrock = boto3.client('bedrock-runtime')
def lambda_handler(event, context):
body = json.loads(event.get('body', '{}'))
prompt = body.get('prompt', '')
response = bedrock.converse_stream(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
messages=[{'role': 'user', 'content': [{'text': prompt}]}],
inferenceConfig={'maxTokens': 1024}
)
def generate():
for event in response['stream']:
if 'contentBlockDelta' in event:
text = event['contentBlockDelta']['delta'].get('text', '')
yield f"data: {json.dumps({'text': text})}\n\n"
yield "data: [DONE]\n\n"
return {
'statusCode': 200,
'headers': {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache'
},
'body': ''.join(generate()),
'isBase64Encoded': False
}
Step Functions for Complex Workflows
Document Processing Pipeline
{
"Comment": "Document processing with GenAI",
"StartAt": "ExtractText",
"States": {
"ExtractText": {
"Type": "Task",
"Resource": "arn:aws:lambda:REGION:ACCOUNT:function:ExtractText",
"Next": "AnalyzeContent"
},
"AnalyzeContent": {
"Type": "Task",
"Resource": "arn:aws:states:::bedrock:invokeModel",
"Parameters": {
"ModelId": "anthropic.claude-3-sonnet-20240229-v1:0",
"Body": {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"messages": [
{
"role": "user",
"content.$": "States.Format('Analyze this document: {}', $.extractedText)"
}
]
}
},
"ResultPath": "$.analysis",
"Next": "GenerateSummary"
},
"GenerateSummary": {
"Type": "Task",
"Resource": "arn:aws:states:::bedrock:invokeModel",
"Parameters": {
"ModelId": "anthropic.claude-3-haiku-20240307-v1:0",
"Body": {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 512,
"messages": [
{
"role": "user",
"content.$": "States.Format('Summarize: {}', $.analysis.Body.content[0].text)"
}
]
}
},
"End": true
}
}
}
Parallel Processing
{
"StartAt": "ParallelAnalysis",
"States": {
"ParallelAnalysis": {
"Type": "Parallel",
"Branches": [
{
"StartAt": "SentimentAnalysis",
"States": {
"SentimentAnalysis": {
"Type": "Task",
"Resource": "arn:aws:states:::bedrock:invokeModel",
"Parameters": {
"ModelId": "anthropic.claude-3-haiku-20240307-v1:0",
"Body": {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 100,
"messages": [{"role": "user", "content.$": "States.Format('Sentiment of: {}', $.text)"}]
}
},
"End": true
}
}
},
{
"StartAt": "KeywordExtraction",
"States": {
"KeywordExtraction": {
"Type": "Task",
"Resource": "arn:aws:states:::bedrock:invokeModel",
"Parameters": {
"ModelId": "anthropic.claude-3-haiku-20240307-v1:0",
"Body": {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 200,
"messages": [{"role": "user", "content.$": "States.Format('Extract keywords: {}', $.text)"}]
}
},
"End": true
}
}
}
],
"Next": "CombineResults"
},
"CombineResults": {
"Type": "Task",
"Resource": "arn:aws:lambda:REGION:ACCOUNT:function:CombineResults",
"End": true
}
}
}
DynamoDB for Conversation History
import boto3
import uuid
from datetime import datetime
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ConversationHistory')
bedrock = boto3.client('bedrock-runtime')
def lambda_handler(event, context):
body = json.loads(event.get('body', '{}'))
conversation_id = body.get('conversation_id', str(uuid.uuid4()))
user_message = body.get('message', '')
response = table.query(
KeyConditionExpression='conversation_id = :cid',
ExpressionAttributeValues={':cid': conversation_id},
ScanIndexForward=True
)
messages = []
for item in response.get('Items', []):
messages.append({'role': item['role'], 'content': [{'text': item['content']}]})
messages.append({'role': 'user', 'content': [{'text': user_message}]})
ai_response = bedrock.converse(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
messages=messages,
inferenceConfig={'maxTokens': 1024}
)
assistant_message = ai_response['output']['message']['content'][0]['text']
timestamp = datetime.utcnow().isoformat()
table.put_item(Item={
'conversation_id': conversation_id,
'timestamp': f"{timestamp}_user",
'role': 'user',
'content': user_message
})
table.put_item(Item={
'conversation_id': conversation_id,
'timestamp': f"{timestamp}_assistant",
'role': 'assistant',
'content': assistant_message
})
return {
'statusCode': 200,
'body': json.dumps({
'conversation_id': conversation_id,
'response': assistant_message
})
}
SQS for Async Processing
import boto3
import json
sqs = boto3.client('sqs')
bedrock = boto3.client('bedrock-runtime')
def submit_handler(event, context):
"""Submit task to queue"""
body = json.loads(event.get('body', '{}'))
task_id = str(uuid.uuid4())
sqs.send_message(
QueueUrl='https://sqs.REGION.amazonaws.com/ACCOUNT/genai-tasks',
MessageBody=json.dumps({
'task_id': task_id,
'prompt': body.get('prompt'),
'callback_url': body.get('callback_url')
})
)
return {
'statusCode': 202,
'body': json.dumps({'task_id': task_id, 'status': 'queued'})
}
def process_handler(event, context):
"""Process tasks from queue"""
for record in event['Records']:
task = json.loads(record['body'])
response = bedrock.converse(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
messages=[{'role': 'user', 'content': [{'text': task['prompt']}]}]
)
result = response['output']['message']['content'][0]['text']
if task.get('callback_url'):
requests.post(task['callback_url'], json={
'task_id': task['task_id'],
'result': result
})
Complete Serverless Architecture
service: genai-app
provider:
name: aws
runtime: python3.12
region: us-east-1
iam:
role:
statements:
- Effect: Allow
Action:
- bedrock:InvokeModel
- bedrock:InvokeModelWithResponseStream
Resource: '*'
- Effect: Allow
Action:
- dynamodb:PutItem
- dynamodb:Query
Resource: !GetAtt ConversationTable.Arn
functions:
chat:
handler: handlers/chat.handler
memorySize: 1024
timeout: 60
events:
- http:
path: /chat
method: post
cors: true
stream:
handler: handlers/stream.handler
memorySize: 1024
timeout: 60
url:
invokeMode: RESPONSE_STREAM
resources:
Resources:
ConversationTable:
Type: AWS::DynamoDB::Table
Properties:
TableName: ${self:service}-conversations
BillingMode: PAY_PER_REQUEST
AttributeDefinitions:
- AttributeName: conversation_id
AttributeType: S
- AttributeName: timestamp
AttributeType: S
KeySchema:
- AttributeName: conversation_id
KeyType: HASH
- AttributeName: timestamp
KeyType: RANGE
Cost Optimization
| Strategy |
Implementation |
| Model selection |
Use Haiku for simple tasks, Sonnet for complex |
| Caching |
Cache frequent responses in ElastiCache |
| Batching |
Process multiple requests together |
| Timeouts |
Set appropriate Lambda timeouts |
| Reserved concurrency |
Limit parallel executions |
Best Practices
| Practice |
Recommendation |
| Error handling |
Implement retries with exponential backoff |
| Logging |
Use structured logging for debugging |
| Monitoring |
Set up CloudWatch alarms |
| Security |
Use VPC endpoints for Bedrock |
| Testing |
Mock Bedrock responses in tests |
Key Takeaways
- Lambda + Bedrock - Simple integration with Converse API
- API Gateway - REST or HTTP APIs for client access
- Step Functions - Orchestrate complex GenAI workflows
- DynamoDB - Store conversation history efficiently
- Async processing - Use SQS for long-running tasks
References