hidekazu-konishi.com

Using Amazon Nova Pro Vision Capabilities on Amazon Bedrock to Verify, Regenerate, and Automate Image Generation with Amazon Nova Canvas

First Published: 2024-12-23
Last Updated: 2024-12-23

In a previous article, I introduced an example of using Amazon Bedrock to verify and regenerate images generated by Amazon Titan Image Generator G1 utilizing the image understanding and analysis capabilities of Anthropic Claude 3.5 Sonnet.

Using Claude 3.5 Sonnet Vision Capabilities on Amazon Bedrock to Verify, Regenerate, and Automate Image Generation with Amazon Titan Image Generator G1

In this article, I will introduce an example of using Amazon Bedrock to verify and regenerate images generated by Amazon Nova Canvas utilizing the image understanding and analysis capabilities of Amazon Nova Pro.
Similar to the previous article, this attempt also aims to reduce the amount of human visual inspection work by automatically determining whether generated images meet requirements.

* The source code published in this article and other articles by this author was developed as part of independent research and is provided 'as is' without any warranty of operability or fitness for a particular purpose. Please use it at your own risk. The code may be modified without prior notice.
* This article uses AWS services on an AWS account registered individually for writing.
* The Amazon Bedrock Models used for writing this article were executed on 2024-12-23 (JST) and are based on the following End user license agreement (EULA) at that time:
Amazon Nova Pro(amazon.nova-pro-v1:0): End user license agreement (EULA) (AWS Customer Agreement and Service Terms)
Amazon Nova Canvas(amazon.nova-canvas-v1:0): End user license agreement (EULA) (AWS Customer Agreement and Service Terms)

Architecture Diagram and Process Flow

The architecture diagram to realize this theme is as follows:

Using Amazon Nova Pro Vision Capabilities on Amazon Bedrock to Verify, Regenerate, and Automate Image Generation with Amazon Nova Canvas

Here's a detailed explanation of this process flow:

1. Input an event containing prompts and parameters.
2-1. Execute the Nova Canvas model on Amazon Bedrock with the input prompt instructing image creation.
2-2. Save the generated image to Amazon S3.
2-3. Execute the Nova Pro model on Amazon Bedrock for the image saved in Amazon S3 to verify if it meets the requirements of the prompt that instructed image creation.
   * If it's not deemed suitable for the requirements of the prompt that instructed image creation, repeat processes 2-1 to 2-3 for the specified number of executions with the same prompt.
   * If it's deemed suitable for the requirements of the prompt that instructed image creation, output that image as the result.
3. If the number of modified prompt executions has not been exceeded and the number of times deemed unsuitable for the requirements of the prompt that instructed image creation exceeds the number of executions with the same prompt, execute the Nova Pro model on Amazon Bedrock to modify the prompt instructing image creation to one that is more likely to meet the requirements. Restart the process from 2-1 with this new prompt instructing image creation.
   * If the number of modified prompt executions is exceeded, end the process as an error.

The key point in this process flow is the modification of the prompt instructing image creation by the Nova Pro model.
If the prompt instructing image creation is easily understandable to AI, there's a high possibility that an image meeting the requirements will be output after several executions.
However, if the prompt instructing image creation is difficult for AI to understand, it's possible that an image meeting the requirements may not be output.
Therefore, when the specified number of executions with the same prompt is exceeded, I included a process to execute the Nova Pro model on Amazon Bedrock and modify the prompt instructing image creation to an optimized one.

Implementation Example

Format of the Input Event

{
    "prompt": "[Initial prompt for image generation]",
    "max_retry_attempts": [Maximum number of attempts to generate an image for each prompt],
    "max_prompt_revisions": [Maximum number of times to revise the prompt],
    "output_s3_bucket_name": "[Name of the S3 bucket to store generated images]",
    "output_s3_key_prefix": "[Prefix for the S3 key of generated images]",
    "nova_pro_validate_temperature": [Temperature parameter for Nova Pro model during image validation (0.0 to 1.0)],
    "nova_pro_validate_top_p": [Top-p parameter for Nova Pro model during image validation (0.0 to 1.0)],
    "nova_pro_validate_top_k": [Top-k parameter for Nova Pro model during image validation],
    "nova_pro_validate_max_tokens": [Maximum number of tokens generated by Nova Pro model during image validation],
    "nova_pro_revise_temperature": [Temperature parameter for Nova Pro model during prompt revision (0.0 to 1.0)],
    "nova_pro_revise_top_p": [Top-p parameter for Nova Pro model during prompt revision (0.0 to 1.0)],
    "nova_pro_revise_top_k": [Top-k parameter for Nova Pro model during prompt revision],
    "nova_pro_revise_max_tokens": [Maximum number of tokens generated by Nova Pro model during prompt revision],
    "nova_canvas_cfg_scale": [CFG scale for Nova Canvas model],
    "nova_canvas_width": [Width of the image generated by Nova Canvas model (in pixels)],
    "nova_canvas_height": [Height of the image generated by Nova Canvas model (in pixels)],
    "nova_canvas_number_of_images": [Number of images to generate with Nova Canvas model], 
    "nova_canvas_seed": [Random seed used by Nova Canvas model (for reproducibility, random if not specified)]
}

Example of Input Event

{
    "prompt": "A serene landscape with mountains and a lake",
    "max_retry_attempts": 5,
    "max_prompt_revisions": 3,
    "output_s3_bucket_name": "your-output-bucket-name",
    "output_s3_key_prefix": "generated-images-nova",
    "nova_pro_validate_temperature": 1.0,
    "nova_pro_validate_top_p": 0.9,
    "nova_pro_validate_top_k": 50,
    "nova_pro_validate_max_tokens": 5120,
    "nova_pro_revise_temperature": 1.0,
    "nova_pro_revise_top_p": 0.9,
    "nova_pro_revise_top_k": 50,
    "nova_pro_revise_max_tokens": 5120,
    "nova_canvas_cfg_scale": 10.0,
    "nova_canvas_width": 1024,
    "nova_canvas_height": 1024,
    "nova_canvas_number_of_images": 1, 
    "nova_canvas_seed": 0
}

Source Code

The source code implemented this time is as follows:

  # #Event Sample
  # {
  #     "prompt": "A serene landscape with mountains and a lake",
  #     "max_retry_attempts": 5,
  #     "max_prompt_revisions": 3,
  #     "output_s3_bucket_name": "your-output-bucket-name",
  #     "output_s3_key_prefix": "generated-images-nova",
  #     "nova_pro_validate_temperature": 1.0,
  #     "nova_pro_validate_top_p": 0.9,
  #     "nova_pro_validate_top_k": 50,
  #     "nova_pro_validate_max_tokens": 5120,
  #     "nova_pro_revise_temperature": 1.0,
  #     "nova_pro_revise_top_p": 0.9,
  #     "nova_pro_revise_top_k": 50,
  #     "nova_pro_revise_max_tokens": 5120,
  #     "nova_canvas_cfg_scale": 10.0,
  #     "nova_canvas_width": 1024,
  #     "nova_canvas_height": 1024,
  #     "nova_canvas_number_of_images": 1, 
  #     "nova_canvas_seed": 0
  # }
  
  import boto3
  import json
  import base64
  import os
  import sys
  from io import BytesIO
  import datetime
  import random
  
  region = os.environ.get('AWS_REGION')
  bedrock_runtime_client = boto3.client('bedrock-runtime', region_name=region)
  s3_client = boto3.client('s3', region_name=region)
  
  def nova_pro_invoke_model(input_prompt, image_media_format=None, image_data_base64=None, model_params={}):
      messages = [
          {
              "role": "user",
              "content": [
                  {
                      "text": input_prompt
                  }
              ]
          }
      ]
  
      if image_media_format and image_data_base64:
          messages[0]["content"].insert(0, {
              "image": {
                  "format": image_media_format, 
                  "source": { 
                      "bytes": image_data_base64 
                  }
              }
          })
  
      body = {
          "messages": messages,
          "inferenceConfig": { # all Optional
              "max_new_tokens": model_params.get('max_tokens', 5120), # greater than 0, equal or less than 5k (default: dynamic*)
              "temperature": model_params.get('temperature', 0.7), # greater then 0 and less than 1.0 (default: 0.7)
              "top_p": model_params.get('top_p', 0.9), # greater than 0, equal or less than 1.0 (default: 0.9)
              "top_k": model_params.get('top_k', 50) # 0 or greater (default: 50)
          }
      }
  
      response = bedrock_runtime_client.invoke_model(
          modelId='amazon.nova-pro-v1:0',
          contentType='application/json',
          accept='application/json',
          body=json.dumps(body)
      )
  
      response_body = json.loads(response.get('body').read())
      response_text = response_body["output"]["message"]["content"][0]['text']
      return response_text
  
  def nova_canvas_invoke_model(prompt, model_params={}):
      seed = model_params.get('seed', 0)
      if seed == 0:
          seed = random.randint(0, 858993459)
      
      optimized_prompt = truncate_to_1024(prompt)
  
      body = {
          "taskType": "TEXT_IMAGE",
          "textToImageParams": {
              "text": optimized_prompt
          },
          "imageGenerationConfig": {
              "numberOfImages": model_params.get('img_number_of_images', 1),
              "height": model_params.get('height', 1024),
              "width": model_params.get('width', 1024),
              "cfgScale": model_params.get('cfg_scale', 8),
              "seed": seed
          }
      }
      
      print(f"Nova Canvas model parameters: {body}")
      
      response = bedrock_runtime_client.invoke_model(
          body=json.dumps(body),
          modelId="amazon.nova-canvas-v1:0",
          contentType="application/json",
          accept="application/json"
      )
      
      response_body = json.loads(response['body'].read())
      image_data = base64.b64decode(response_body.get("images")[0].encode('ascii'))
  
      finish_reason = response_body.get("error")
      if finish_reason is not None:
          print(f"Image generation error. Error is {finish_reason}")
      else:
          print(f"Image generated successfully with seed: {seed}")
      
      return image_data
  
  def truncate_to_1024(text):
      if len(text) <= 1024:
          return text
      
      truncated = text[:1024]
      last_period = truncated.rfind('.')
      last_comma = truncated.rfind(',')
      last_break = max(last_period, last_comma)
      
      if last_break > 256:  # Only if the last sentence or phrase is not too long
          return truncated[:last_break + 1]
      else:
          return truncated
  
  def save_image_to_s3(image_data, bucket, key):
      s3_client.put_object(
          Bucket=bucket,
          Key=key,
          Body=image_data
      )
      print(f"Image saved to S3: s3://{bucket}/{key}")
  
  def validate_image(image_data, prompt, nova_pro_validate_params):
      image_base64 = base64.b64encode(image_data).decode('utf-8')
      
      input_prompt = f"""Does this image match the following prompt? Prompt: {prompt}. 
      Please answer in the following JSON format:
      {{"result":"", "reason":""}}
      Ensure your response can be parsed as valid JSON. Do not include any explanations, comments, or additional text outside of the JSON structure."""
  
      validation_result = nova_pro_invoke_model(input_prompt, "png", image_base64, nova_pro_validate_params)
      
      try:
          print(f"validation Result: {validation_result}")
          parsed_result = json.loads(validation_result)
          is_valid = parsed_result['result'].upper() == 'YES'
          print(f"Image validation result: {is_valid}")
          print(f"Validation reason: {parsed_result['reason']}")
          return is_valid
      except json.JSONDecodeError:
          print(f"Error parsing validation result: {validation_result}")
          return False
  
  def revise_prompt(original_prompt, nova_pro_revise_params):
      input_prompt = f"""Revise the following image generation prompt to optimize it for Amazon Nova Canvas, incorporating best practices:
  
      {original_prompt}
  
      Please consider the following guidelines in your revision:
      1. Frame the prompt as an image caption rather than a command, focusing on describing what's in the image.
      2. Include clear descriptions of:
         - The main subject
         - The environment/setting
         - Position or pose of subjects (if relevant)
         - Lighting conditions (if relevant)
         - Camera position/framing (if relevant)
         - Visual style or medium (if relevant)
      3. Avoid using negation words (no, not, without, etc.) in the main prompt.
      4. Place most important elements at the beginning of the prompt.
      5. Separate different elements with clear, descriptive phrases.
      6. Use specific, descriptive adjectives to convey the desired mood and style.
      7. If the original prompt is not in English, translate it to English.
  
      Your goal is to create a clear, detailed prompt that will result in a high-quality image generation with Nova Canvas, while staying within the 1024-character limit.
      
      Please provide your response in the following JSON format:
      {{"revised_prompt":""}}
      Ensure your response can be parsed as valid JSON. Do not include any explanations, comments, or additional text outside of the JSON structure."""
  
      revised_prompt_json = nova_pro_invoke_model(input_prompt, model_params=nova_pro_revise_params)
      print(f"Original prompt: {original_prompt}")
      print(f"Revised prompt JSON: {revised_prompt_json.strip()}")
      
      try:
          parsed_result = json.loads(revised_prompt_json)
          revised_prompt = parsed_result['revised_prompt']
          print(f"Parsed revised prompt: {revised_prompt}")
          return revised_prompt
      except json.JSONDecodeError:
          print(f"Error parsing revised prompt result: {revised_prompt_json}")
          return original_prompt
  
  def lambda_handler(event, context):
      try:
          initial_prompt = event['prompt']
          prompt = initial_prompt
          max_retry_attempts = max(0, event.get('max_retry_attempts', 5) - 1)
          max_prompt_revisions = max(0, event.get('max_prompt_revisions', 3) - 1)
          output_s3_bucket_name = event['output_s3_bucket_name']
          output_s3_key_prefix = event.get('output_s3_key_prefix', 'generated-images')
  
          print(f"Initial prompt: {initial_prompt}")
          print(f"Max retry attempts: {max_retry_attempts}")
          print(f"Max prompt revisions: {max_prompt_revisions}")
  
          # Model parameters
          nova_pro_validate_params = {
              'temperature': event.get('nova_pro_validate_temperature', 0.7),
              'top_p': event.get('nova_pro_validate_top_p', 0.9),
              'top_k': event.get('nova_pro_validate_top_k', 50),
              'max_tokens': event.get('nova_pro_validate_max_tokens', 5120)
          }
          nova_pro_revise_params = {
              'temperature': event.get('nova_pro_revise_temperature', 0.7),
              'top_p': event.get('nova_pro_revise_top_p', 0.9),
              'top_k': event.get('nova_pro_revise_top_k', 50),
              'max_tokens': event.get('nova_pro_revise_max_tokens', 5120)
          }
          nova_canvas_params = {
              'cfg_scale': event.get('nova_canvas_cfg_scale', 8),
              "width": event.get('nova_canvas_width', 1024),
              "height": event.get('nova_canvas_height', 1024),
              'img_number_of_images': event.get('nova_canvas_number_of_images', 1),
              "seed": event.get('nova_canvas_seed', 0)
          }
  
          print(f"Nova Pro validate params: {nova_pro_validate_params}")
          print(f"Nova Pro revise params: {nova_pro_revise_params}")
          print(f"Nova Canvas params: {nova_canvas_params}")
  
          # Generate start timestamp and S3 key
          start_timestamp = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
          for revision in range(max_prompt_revisions + 1):
              print(f"Starting revision {revision}")
              for attempt in range(max_retry_attempts + 1):
                  print(f"Attempt {attempt} for generating image")
                  
                  # Generate image with Nova Canvas
                  image_data = nova_canvas_invoke_model(prompt, nova_canvas_params)
  
                  image_key = f"{output_s3_key_prefix}-{start_timestamp}-{revision:03d}-{attempt:03d}.png"
  
                  # Save image to S3
                  save_image_to_s3(image_data, output_s3_bucket_name, image_key)
  
                  # Validate image with Amazon Nova Pro
                  is_valid = validate_image(image_data, initial_prompt, nova_pro_validate_params)
  
                  if is_valid:
                      print("Valid image generated successfully")
                      return {
                          'statusCode': 200,
                          'body': json.dumps({
                              'status': 'SUCCESS',
                              'message': 'Image generated successfully',
                              'output_s3_bucket_url': f'https://s3.console.aws.amazon.com/s3/buckets/{output_s3_bucket_name}',
                              'output_s3_object_url': f'https://s3.console.aws.amazon.com/s3/object/{output_s3_bucket_name}?region={region}&bucketType=general&prefix={image_key}'
                          })
                      }
  
              # If max retry attempts reached and not the last revision, revise prompt
              if revision < max_prompt_revisions:
                  print("Revising prompt")
                  prompt = revise_prompt(initial_prompt, nova_pro_revise_params)
  
          print("Failed to generate a valid image after all attempts and revisions")
          return {
              'statusCode': 400,
              'body': json.dumps({
                  'status': 'FAIL',
                  'error': 'Failed to generate a valid image after all attempts and revisions'
              })
          }
  
      except Exception as ex:
          print(f'Exception: {ex}')
          tb = sys.exc_info()[2]
          err_message = f'Exception: {str(ex.with_traceback(tb))}'
          print(err_message)
          return {
              'statusCode': 500,
              'body': json.dumps({
                  'status': 'FAIL',
                  'error': err_message
              })
          }

The points of ingenuity in this source code include the following:

Implemented a mechanism to automate the cycle of image generation and validation, repeating until requirements are met
Used Nova Pro for validating generated images and revising prompts
Used Nova Canvas for high-quality image generation
Included the recommendations listed in the Amazon Nova Canvas prompting best practices in the prompt revision instructions
Made image generation parameters (cfgScale, width, height, seed) customizable
Made Nova Pro invocation parameters (temperature, top_p, top_k, max_tokens) adjustable
Automatically saved generated images to S3 bucket and returned the result URL
Implemented appropriate error handling and logging to facilitate troubleshooting
Used JSON format to structure dialogues with Amazon Nova Pro, making result parsing easier
Made maximum retry attempts and maximum prompt revisions configurable to prevent infinite loops

Execution Details and Results

An Example of Execution: Input Parameters

{
    "prompt": "自然の中から見た夜景で、空にはオーロラと月と流星群があり、地上には海が広がって流氷が流れ、地平線から太陽が出ている無人の写真。",
    "max_retry_attempts": 5,
    "max_prompt_revisions": 5,
    "output_s3_bucket_name": "ho2k.com",
    "output_s3_key_prefix": "generated-images-nova",
    "nova_pro_validate_temperature": 1.0,
    "nova_pro_validate_top_p": 0.9,
    "nova_pro_validate_top_k": 50,
    "nova_pro_validate_max_tokens": 5120,
    "nova_pro_revise_temperature": 1.0,
    "nova_pro_revise_top_p": 0.9,
    "nova_pro_revise_top_k": 50,
    "nova_pro_revise_max_tokens": 5120,
    "nova_canvas_cfg_scale": 10.0,
    "nova_canvas_width": 1024,
    "nova_canvas_height": 1024,
    "nova_canvas_number_of_images": 1, 
    "nova_canvas_seed": 0
}

* The Japanese text set in the prompt above translates to the following meaning in English:
"A night view from nature, with aurora, moon, and meteor shower in the sky, the sea spreading on the ground with drifting ice, and the sun rising from the horizon in an uninhabited photograph."
In this execution, I am attempting to optimize instructions given in Japanese sentences that are not optimized as prompts for Amazon Nova Canvas through prompt modification by Nova Pro.

The input parameters for this execution example include the following considerations:

max_retry_attempts is set to 5 to increase the success rate of image generation.
max_prompt_revisions is set to 5, providing more opportunities to improve the prompt if needed.
Parameters for Amazon Nova Pro model for image validation and revision (temperature, top_p, top_k, max_tokens) are finely set.
nova_canvas_cfg_scale is set to 10 to increase fidelity to the prompt.
The seed used for image generation is set to be random, ensuring different images are generated each time.

An Example of Execution: Results

Generated Image

The final image that met the prompt requirements and passed verification in this trial is shown below.
This image actually meets almost all the requirements of "自然の中から見た夜景で、空にはオーロラと月と流星群があり、地上には海が広がって流氷が流れ、地平線から太陽が出ている無人の写真。"(The meaning is "A night view from nature, with aurora, moon, and meteor shower in the sky, the sea spreading on the ground with drifting ice, and the sun rising from the horizon in an uninhabited photograph.")
(The number of meteors in the meteor shower is limited, but the contradictory scenery of the moon and the sun on the horizon, meteor shower, and drifting ice are clearly expressed).
Also, compared to other images generated earlier (see "List of Generated Images" below), I confirmed that the final image that passed verification satisfied more of the specified requirements.

Image that met prompt requirements and passed verification

Here is a list of images generated during this trial run.
Each row of images in this "List of Generated Images" was generated from different modified prompts.
While continuous image outputs from the initial Japanese text prompt slightly deviated from the requirements, after the first prompt modification, the images produced were very close to meeting the requirements from the first attempt, and by the second attempt, the images fully met the requirements.
However, what is particularly noteworthy about Amazon Nova Canvas is that, unlike Amazon Titan Image Generator G1, it can understand the content of Japanese language prompts and generate images that closely match the requirements.

Changes in Modified Prompts

Each row of images in the "List of Generated Images" shown above was generated from different modified prompts.
Specifically, the image in the first row of the "List of Generated Images" was generated from the "0th modification" prompt below, while the image in the last row was generated from the "1st modification" prompt below.
Let's look at the content of the modified image generation prompts for each number of prompt modifications.
0th modification

自然の中から見た夜景で、空にはオーロラと月と流星群があり、地上には海が広がって流氷が流れ、地平線から太陽が出ている無人の写真。

* The meaning is "A night view from nature, with aurora, moon, and meteor shower in the sky, the sea spreading on the ground with drifting ice, and the sun rising from the horizon in an uninhabited photograph."

1st modification

An awe-inspiring nighttime seascape featuring vibrant auroras, a luminous moon, and a spectacular meteor shower. The tranquil ocean is dotted with drifting icebergs, creating a serene and mystical atmosphere. The horizon glows with the warm light of the rising sun, casting a golden hue across the scene. The image is captured from a vantage point within nature, providing a panoramic view of this breathtaking natural phenomenon. The visual style is realistic, with a focus on capturing the vivid colors and dynamic lighting of the auroras and meteors against the calm, icy waters.

When viewed in connection with the "List of Generated Images" shown above, the initial Japanese text prompt was not optimized for image generation, but still produced images that were close to the requirements.
On the other hand, when the prompt was optimized for image generation by Amazon Nova Pro in the first revision, the subsequent execution produced images that were very close to the requirements, and the second attempt produced images that met the requirements.
In this way, the images changed with each prompt modification and generation execution, and ultimately, an image that met the prompt requirements passed verification.

References:
Tech Blog with curated related content
AWS Documentation(Amazon Bedrock)
AWS Documentation(Amazon Nova)
Amazon Nova Canvas prompting best practices

Summary

In this article, I introduced an example of using Amazon Bedrock to verify and regenerate images generated by Amazon Nova Canvas utilizing the image understanding and analysis capabilities of Amazon Nova Pro.
Through this attempt, I confirmed that Nova Pro's image recognition capabilities can recognize the content and expression of images, and can be used to verify requirement fulfillment.
Furthermore, I found that Nova Pro can be used for prompt optimization for Nova Canvas, and it has a high ability to translate Japanese prompts into English and modify them into a format suitable for image generation.
And most importantly, by automating the cycle of image generation and verification, I was able to significantly reduce the amount of human visual inspection work.
A notable point is that, similar to the previous example using Amazon Titan Image Generator G1 and this example using Amazon Nova Canvas, by tailoring the prompt modification instructions to the best practices of each image generation AI, effective prompt optimization and image generation automation can be applied to various other image generation AIs.

In this way, Nova Pro brings new possibilities to the control of image generation AI (such as Nova Canvas) and processes that were previously difficult to automate.
I will continue to watch for the evolution of AI models provided by Amazon Bedrock and new implementation methods utilizing them, exploring further expansion of application areas.

Written by Hidekazu Konishi