Mastering Prompt Optimization: Amazon Bedrock's Advanced Tool for Model Migration and Performance Boosts

By ● min read

Amazon Bedrock introduces a powerful new feature called Advanced Prompt Optimization, designed to help you refine your prompts for any model on the platform. This tool not only optimizes prompts but also enables seamless migration between models, all while allowing you to compare performance across up to five models at once. Whether you're looking to improve your current model's output or switch to a new one without regression, this solution provides a metric-driven feedback loop to deliver measurable results. Below, we answer key questions about this innovative tool.

What is Amazon Bedrock Advanced Prompt Optimization?

Amazon Bedrock Advanced Prompt Optimization is a new tool that automatically refines your prompt templates to enhance model responses. It takes your original prompts and generates optimized versions, then evaluates them against your chosen metrics. You can test up to five models simultaneously—including your baseline and up to four others—to see which prompt performs best. The tool supports multimodal inputs (PNG, JPG, PDF), custom evaluation methods via AWS Lambda or LLM-as-a-judge, and outputs detailed scores, cost estimates, and latency data. This makes it ideal for both improving existing models and migrating to new ones without sacrificing performance.

Mastering Prompt Optimization: Amazon Bedrock's Advanced Tool for Model Migration and Performance Boosts — Source: aws.amazon.com

How does the prompt optimization process work?

The optimization operates in a metric-driven feedback loop. You provide a prompt template, example user inputs for variable values, ground truth answers, and an evaluation metric. The tool then iteratively adjusts the prompt to maximize performance against that metric. You can guide optimization using a natural language description, an AWS Lambda function, or an LLM-as-a-judge rubric. The process compares the original and optimized prompts, outputting both templates along with evaluation scores, cost estimates, and latency. To start, simply choose Create prompt optimization on the Amazon Bedrock console's Advanced Prompt Optimization page.

What inputs are required for prompt optimization?

You need to prepare your prompt templates in JSONL format. Each JSON object must be on a single line and include: a version field (fixed value bedrock-2026-05-14), a template ID, the prompt template string, optional steering criteria, and evaluation samples. Each evaluation sample contains input variables (key-value pairs) and a reference response. Additionally, you must specify an evaluation metric—either by providing a custom label with an LLM-as-a-judge prompt and model ID, or by supplying an AWS Lambda function ARN. For multimodal tasks, input variables can include PNG, JPG, or PDF file references.

Can I use the tool for multimodal inputs like images and PDFs?

Yes, Amazon Bedrock Advanced Prompt Optimization supports multimodal inputs, including PNG, JPG, and PDF files. This allows you to optimize prompts for tasks such as document analysis, image captioning, or visual question answering. When defining your prompt template variables, you can reference these file types so the optimizer generates variations that work with visual or textual data. The tool treats multimodal inputs as part of the prompt context, enabling model comparisons on diverse use cases. This feature is particularly valuable for enterprises dealing with scanned documents, diagrams, or photo-based workflows.

How can I compare performance across multiple models simultaneously?

The tool lets you select up to five inference models to test your prompts against. If you're migrating to a new model, set your current model as a baseline and choose up to four target models. If you're only improving your existing setup, select just your current model to see before-and-after optimization. The optimizer runs the same prompt variations on all selected models, then presents side-by-side evaluation scores. This makes it easy to identify which model yields the best results for your specific tasks, while also detecting any regressions on known use cases.

What are the benefits for migrating to a new model?

Migrating to a new model often risks degrading performance due to differences in prompt sensitivity. Advanced Prompt Optimization mitigates this by optimizing prompts specifically for the target model. You include your current model as a baseline and up to four others; the tool generates prompts that maximize scores for each model individually. It then highlights any regressions on known use cases and suggests improvements for underperforming tasks. The output includes cost estimates and latency comparisons, helping you choose the most cost-effective model without sacrificing quality. This streamlined approach reduces the trial-and-error typical of model migration.

How do I provide custom evaluation metrics using Lambda or LLM-as-a-judge?

If you need evaluation beyond simple metrics, you can supply either an AWS Lambda function or an LLM-as-a-judge rubric. For Lambda, provide the function ARN in the evaluationMetricLambdaArn field; the function should return a numeric score. For LLM-as-a-judge, include a custom prompt and model ID in the customLLMJConfig object. You must also define a customEvaluationMetricLabel. The optimizer uses your chosen method as the guide in its feedback loop, ensuring the optimized prompt aligns with your specific quality criteria. This flexibility allows you to tailor evaluations to your domain, whether that's accuracy, tone, or adherence to guidelines.

What outputs does the tool provide?

After optimization, the tool delivers a comprehensive report containing both the original and optimized prompt templates, along with evaluation scores for each model. It also provides cost estimates per inference and latency measurements. These outputs help you make data-driven decisions: you can compare the performance uplift, see if the optimized prompt reduces costs or speeds up responses, and assess whether the trade-offs are acceptable. The results are presented in the console and can be exported for further analysis. This visibility ensures you understand exactly how your prompts change and what impact they have on model behavior.

Tags: