Amazon Bedrock Knowledge Bases: Building RAG Applications

Amazon Bedrock Knowledge Bases provides a fully managed RAG solution. It handles document ingestion, embedding generation, vector storage, and retrieval—letting you focus on your application.

Knowledge Bases Architecture

flowchart TB
    subgraph Ingestion["Data Ingestion"]
        S3["S3 Bucket"] --> Parser["Document Parser"]
        Parser --> Chunker["Chunking"]
        Chunker --> Embed["Embedding"]
    end

    subgraph Storage["Vector Storage"]
        Embed --> VS["Vector Store"]
    end

    subgraph Query["Query Flow"]
        Q["User Query"] --> QE["Query Embedding"]
        QE --> Retrieve["Retrieve"]
        VS --> Retrieve
        Retrieve --> Augment["Augment Prompt"]
        Augment --> FM["Foundation Model"]
        FM --> Response["Response"]
    end

    style VS fill:#3b82f6,color:#fff
    style FM fill:#8b5cf6,color:#fff

Creating a Knowledge Base

Console Setup

Navigate to Amazon Bedrock → Knowledge Bases
Click "Create knowledge base"
Configure:
- Name and description
- IAM role (auto-created or existing)
- Data source (S3)
- Embedding model
- Vector store

Using AWS CLI

# Create knowledge base
aws bedrock-agent create-knowledge-base \
    --name "product-docs-kb" \
    --role-arn "arn:aws:iam::123456789012:role/BedrockKBRole" \
    --knowledge-base-configuration '{
        "type": "VECTOR",
        "vectorKnowledgeBaseConfiguration": {
            "embeddingModelArn": "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v1"
        }
    }' \
    --storage-configuration '{
        "type": "OPENSEARCH_SERVERLESS",
        "opensearchServerlessConfiguration": {
            "collectionArn": "arn:aws:aoss:us-east-1:123456789012:collection/abc123",
            "vectorIndexName": "bedrock-kb-index",
            "fieldMapping": {
                "vectorField": "embedding",
                "textField": "text",
                "metadataField": "metadata"
            }
        }
    }'

Data Sources

S3 Data Source

import boto3

client = boto3.client('bedrock-agent')

# Create data source
response = client.create_data_source(
    knowledgeBaseId='KB_ID',
    name='product-docs',
    dataSourceConfiguration={
        'type': 'S3',
        's3Configuration': {
            'bucketArn': 'arn:aws:s3:::my-docs-bucket',
            'inclusionPrefixes': ['docs/']
        }
    },
    vectorIngestionConfiguration={
        'chunkingConfiguration': {
            'chunkingStrategy': 'FIXED_SIZE',
            'fixedSizeChunkingConfiguration': {
                'maxTokens': 512,
                'overlapPercentage': 20
            }
        }
    }
)

Supported File Types

Format	Extension
Text	.txt
PDF	.pdf
HTML	.html
Markdown	.md
Word	.doc, .docx
CSV	.csv
Excel	.xls, .xlsx

Chunking Strategies

Fixed Size Chunking

'chunkingConfiguration': {
    'chunkingStrategy': 'FIXED_SIZE',
    'fixedSizeChunkingConfiguration': {
        'maxTokens': 512,
        'overlapPercentage': 20
    }
}

Semantic Chunking

'chunkingConfiguration': {
    'chunkingStrategy': 'SEMANTIC',
    'semanticChunkingConfiguration': {
        'maxTokens': 512,
        'bufferSize': 0,
        'breakpointPercentileThreshold': 95
    }
}

Hierarchical Chunking

'chunkingConfiguration': {
    'chunkingStrategy': 'HIERARCHICAL',
    'hierarchicalChunkingConfiguration': {
        'levelConfigurations': [
            {'maxTokens': 1500},  # Parent chunks
            {'maxTokens': 300}    # Child chunks
        ],
        'overlapTokens': 60
    }
}

Chunking Strategy Comparison

Strategy	Best For	Characteristics
Fixed Size	General use	Predictable, simple
Semantic	Natural boundaries	Better context preservation
Hierarchical	Long documents	Multi-level retrieval

Syncing Data

# Start ingestion job
response = client.start_ingestion_job(
    knowledgeBaseId='KB_ID',
    dataSourceId='DS_ID'
)

job_id = response['ingestionJob']['ingestionJobId']

# Check status
status_response = client.get_ingestion_job(
    knowledgeBaseId='KB_ID',
    dataSourceId='DS_ID',
    ingestionJobId=job_id
)

print(f"Status: {status_response['ingestionJob']['status']}")

Querying Knowledge Bases

Retrieve API

Get relevant documents without generation:

client = boto3.client('bedrock-agent-runtime')

response = client.retrieve(
    knowledgeBaseId='KB_ID',
    retrievalQuery={
        'text': 'What is the return policy?'
    },
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'numberOfResults': 5
        }
    }
)

for result in response['retrievalResults']:
    print(f"Score: {result['score']}")
    print(f"Content: {result['content']['text']}")
    print(f"Source: {result['location']['s3Location']['uri']}")
    print("---")

RetrieveAndGenerate API

Retrieve and generate response in one call:

response = client.retrieve_and_generate(
    input={
        'text': 'What is the return policy?'
    },
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': 'KB_ID',
            'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0',
            'retrievalConfiguration': {
                'vectorSearchConfiguration': {
                    'numberOfResults': 5
                }
            }
        }
    }
)

print(response['output']['text'])

# Show citations
for citation in response.get('citations', []):
    for ref in citation.get('retrievedReferences', []):
        print(f"Source: {ref['location']['s3Location']['uri']}")

Session Management

Maintain conversation context:

# First query
response = client.retrieve_and_generate(
    input={'text': 'What products do you offer?'},
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': 'KB_ID',
            'modelArn': 'model-arn'
        }
    }
)

session_id = response['sessionId']

# Follow-up query with session
response = client.retrieve_and_generate(
    sessionId=session_id,
    input={'text': 'What are their prices?'},
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': 'KB_ID',
            'modelArn': 'model-arn'
        }
    }
)

Advanced Configuration

Metadata Filtering

response = client.retrieve(
    knowledgeBaseId='KB_ID',
    retrievalQuery={'text': 'pricing information'},
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'numberOfResults': 5,
            'filter': {
                'equals': {
                    'key': 'category',
                    'value': 'pricing'
                }
            }
        }
    }
)

Hybrid Search

'vectorSearchConfiguration': {
    'numberOfResults': 5,
    'overrideSearchType': 'HYBRID'  # Combines vector + keyword
}

Vector Store Options

Store	Setup	Best For
OpenSearch Serverless	Auto-managed	Most use cases
Aurora PostgreSQL	Self-managed	Existing PostgreSQL
Pinecone	External	Existing Pinecone
Redis Enterprise	External	Low latency
MongoDB Atlas	External	Existing MongoDB

Complete RAG Application

import boto3
from typing import Optional

class KnowledgeBaseRAG:
    def __init__(self, kb_id: str, model_arn: str):
        self.client = boto3.client('bedrock-agent-runtime')
        self.kb_id = kb_id
        self.model_arn = model_arn
        self.session_id: Optional[str] = None

    def query(self, question: str, new_session: bool = False) -> dict:
        if new_session:
            self.session_id = None

        kwargs = {
            'input': {'text': question},
            'retrieveAndGenerateConfiguration': {
                'type': 'KNOWLEDGE_BASE',
                'knowledgeBaseConfiguration': {
                    'knowledgeBaseId': self.kb_id,
                    'modelArn': self.model_arn,
                    'retrievalConfiguration': {
                        'vectorSearchConfiguration': {
                            'numberOfResults': 5
                        }
                    }
                }
            }
        }

        if self.session_id:
            kwargs['sessionId'] = self.session_id

        response = self.client.retrieve_and_generate(**kwargs)
        self.session_id = response.get('sessionId')

        return {
            'answer': response['output']['text'],
            'citations': [
                ref['location']['s3Location']['uri']
                for citation in response.get('citations', [])
                for ref in citation.get('retrievedReferences', [])
            ],
            'session_id': self.session_id
        }

# Usage
rag = KnowledgeBaseRAG(
    kb_id='YOUR_KB_ID',
    model_arn='arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0'
)

result = rag.query("What is your return policy?")
print(f"Answer: {result['answer']}")
print(f"Sources: {result['citations']}")

Best Practices

Practice	Recommendation
Chunk size	256-512 tokens for most cases
Overlap	10-20% to preserve context
Update frequency	Sync when source data changes
Model selection	Match model to query complexity
Metadata	Add filters for better relevance

Key Takeaways

Fully managed RAG - No infrastructure to manage
Multiple chunking strategies - Choose based on document type
Built-in citations - Track source documents
Session support - Multi-turn conversations
Flexible storage - Multiple vector store options