hidekazu-konishi.com

Validating and Regenerating Videos Using Amazon Nova Pro Vision Model on Amazon Bedrock (Amazon Nova Reel Edition)

First Published: 2025-02-04
Last Updated: 2025-03-31

In previous articles, I introduced methods for automating media generation, verification, and regeneration using combinations of Vision understanding models and AI generation models on Amazon Bedrock, such as Amazon Nova Pro with Amazon Nova Canvas, Anthropic Claude with Amazon Titan Image Generator G1, and Anthropic Claude with Stable Diffusion XL.

Using Amazon Nova Pro Vision Capabilities on Amazon Bedrock to Verify, Regenerate, and Automate Image Generation with Amazon Nova Canvas
Using Claude 3.5 Sonnet Vision Capabilities on Amazon Bedrock to Verify, Regenerate, and Automate Image Generation with Amazon Titan Image Generator G1
Using Amazon Bedrock to repeatedly generate images with Stable Diffusion XL via Claude 3.5 Sonnet until requirements are met

In this article, I will explore how to utilize Amazon Nova Pro's video understanding and analysis capabilities on Amazon Bedrock to automate the verification and regeneration of videos created with Amazon Nova Reel.
Similar to the previous articles, this attempt aims to reduce manual visual inspection workload by automatically determining if generated videos meet requirements.

* The source code published in this article and other articles by this author was developed as part of independent research and is provided 'as is' without any warranty of operability or fitness for a particular purpose. Please use it at your own risk. The code may be modified without prior notice.
* This article uses AWS services on an AWS account registered individually for writing.
* The Amazon Bedrock Models used for writing this article were executed on 2025-02-02 (JST) and are based on the following End user license agreement (EULA) at that time:
Amazon Nova Pro (amazon.nova-pro-v1:0): End user license agreement (EULA) (AWS Customer Agreement and Service Terms)
Amazon Nova Reel (amazon.nova-reel-v1:0): End user license agreement (EULA) (AWS Customer Agreement and Service Terms)

Architecture Diagram and Process Flow

The architecture diagram to implement the theme of this article is as follows.
In this case, we are using AWS Step Functions and AWS Lambda to process the operations.

Validating and Regenerating Videos Using Amazon Nova Pro Vision Model on Amazon Bedrock (Amazon Nova Reel Edition)

Here's a detailed explanation of this process flow:

1. Input an event containing prompts and parameters.

2-1. Execute the Nova Reel model on Amazon Bedrock with the input video creation prompt.

2-2. Save the generated video to Amazon S3.

2-3. Execute the Nova Pro model on Amazon Bedrock to verify if the video stored in Amazon S3 meets the requirements specified in the video creation prompt.
   * If determined not to meet the requirements of the video creation prompt, repeat steps `2-1.` to `2-3.` for the specified number of same prompt executions.
   * If determined to meet the requirements of the video creation prompt, use that video as the output result.

3. If the revision prompt execution count has not been exceeded and the number of times determined not to meet the video creation prompt requirements exceeds the same prompt execution count, execute the Nova Pro model on Amazon Bedrock to modify the video creation prompt to one with a higher likelihood of meeting requirements. Restart processing from `2-1.` with this new video creation prompt.
   * If the revision prompt execution count is exceeded, end processing with an error.

The key point in this process flow is the modification of the video creation prompt by the Nova Pro model.
If the video creation prompt is easy for AI to understand, there's a high probability that a video meeting the requirements will be generated after several attempts.
However, if the video creation prompt is difficult for AI to understand, videos meeting the requirements may not be generated.
Therefore, when the specified number of same prompt executions is exceeded, I included a process to execute Nova Pro on Amazon Bedrock to optimize and modify the video creation prompt.

Implementation Example

AWS CloudFormation Template

AWS CloudFormation Template (Click to expand)

AWS CloudFormation Template Explanation

This AWS CloudFormation template deploys the following AWS resources:

Lambda Functions

InitializeProcessFunction
- Parameter initialization at Step Functions execution start
- Timestamp generation and prompt initial setup
- Nova Reel and Nova Pro parameter settings
- Setting limits for retry count and prompt revision count
- S3 bucket information setup and validation
GenerateVideoFunction
- Generate video using Nova Reel
- Optimize prompt within 512 characters
- Set video generation parameters (FPS, resolution, duration)
- Configure Amazon S3 storage destination
  python s3Uri = f"s3://{bucket_name}/{key_prefix}/{timestamp}/"
- Obtain and return invocation_arn for asynchronous execution
CheckVideoStatusFunction
- Check status of Amazon Bedrock asynchronous execution
- Save generated video to S3 URI
  python video_s3_uri = f"s3://{bucket_name}/{prefix}/{timestamp}/{s3_prefix}/output.mp4"
- Check and return processing status (Completed/InProgress/Failed)
- Error handling
ValidateVideoFunction
- Validate generated video using Nova Pro
- Retrieve video from S3 and input to Nova Pro
- Generate and execute validation prompt
- Return validation results (success/failure) and reasons in JSON format
- Appropriate error handling for exceptions
RevisePromptFunction
- Revise prompt using Nova Pro
- Apply prompt optimization guidelines:
  - Clear description of main subject
  - Environment/setting details
  - Movement and action descriptions
  - Lighting conditions
  - Camera movement
  - Visual style
- Return revised prompt in JSON format
- Optimization within 512 character limit

IAM Roles

LambdaExecutionRole
- Access permissions for Bedrock service
- Read and write permissions for S3 bucket
- Permission to write logs to CloudWatch Logs
StepFunctionsExecutionRole
- Execution permissions for all Lambda functions
- State machine execution management permissions

Step Functions State Machine

Control of workflow sequence from initialization to success/failure
Regular checking of video generation status (10-second intervals)
Transition to prompt revision flow when validation fails
Management of maximum retry count and prompt revision count
Error handling and transition to appropriate end states

Execution Details and Results

From this point forward, I would like to analyze a case study based on an actual implementation of this system.
First, we will start with the input parameters and their significance.

Input Parameters and Their Configuration

{
    "prompt": "自然の中から見た富士山がある無人の夜景で、空には月があってオーロラが動いて流星群が流れており、地上には海が広がって流氷が流れており、水平線からは太陽が登る日の出の映像", # Initial prompt for video generation
    "output_s3_bucket_name": "ho2k.com", # Name of S3 bucket to store generated videos
    "output_s3_key_prefix": "generated-videos", # S3 key prefix for generated videos
    "max_retry_attempts": 1, # Maximum number of video generation attempts per prompt
    "max_prompt_revisions": 5, # Maximum number of prompt revisions
    "nova_pro_validate_temperature": 0.3, # Nova Pro model temperature for video validation
    "nova_pro_validate_top_p": 0.9, # Nova Pro model top-p for video validation
    "nova_pro_validate_top_k": 40, # Nova Pro model top-k for video validation
    "nova_pro_validate_max_tokens": 5120, # Nova Pro model maximum tokens for video validation
    "nova_pro_revise_temperature": 0.7, # Nova Pro model temperature for prompt revision
    "nova_pro_revise_top_p": 0.9, # Nova Pro model top-p for prompt revision
    "nova_pro_revise_top_k": 50, # Nova Pro model top-k for prompt revision
    "nova_pro_revise_max_tokens": 5120, # Nova Pro model maximum tokens for prompt revision
    "nova_reel_duration_seconds": 6, # Generated video duration (seconds)
    "nova_reel_fps": 24, # Generated video frame rate
    "nova_reel_dimension": "1280x720", # Generated video resolution
    "nova_reel_seed": 0 # Seed value for video generation reproducibility (random if 0)
}

In configuring these parameters, the following considerations were made:

Set max_retry_attempts to 1 to improve the prompt at an early stage, after the first retry (second execution).
Set max_prompt_revisions to 5 to increase opportunities for prompt improvement.
Finely tuned Nova Pro model parameters (temperature, top_p, top_k, max_tokens) for video validation and revision.
Specifically set lower values for nova_pro_validate_temperature and nova_pro_validate_top_k to enable more precise validation.
Set nova_reel_duration_seconds to 6 to generate videos of appropriate length.
Set nova_reel_fps to 24 to generate videos with smooth movement.
Set video generation seed to be random to ensure different videos are generated each time.

An Example of Execution: Results

Next, let's examine an example of the execution results using these parameters.

The video that passed verification

Here is the video that ultimately passed validation by meeting the prompt requirements.
This video largely expresses the elements mentioned in the original prompt: "自然の中から見た富士山がある無人の夜景で、空には月があってオーロラが動いて流星群が流れており、地上には海が広がって流氷が流れており、水平線からは太陽が登る日の出の映像"(The meaning is "A video of an uninhabited night scene of Mount Fuji seen from nature, with the moon in the sky, moving aurora, meteor shower, ocean spreading below with drifting sea ice, and the sun rising from the horizon at dawn.")

The following elements were appropriately expressed:

Uninhabited night scene of Mount Fuji seen from nature
Moon in the sky with moving aurora
Two meteors appearing in some scenes
Ocean spreading below with drifting sea ice
Sun visible at the horizon in some scenes

On the other hand, the following elements were not appropriately expressed:

Not enough meteors to be called a meteor shower
Shows a sunset rather than a sunrise

While there were both well-expressed and poorly-expressed elements, Amazon Nova Pro Vision's video validation determined that it satisfied the initial prompt requirements.
It was confirmed that this final video met more of the initial prompt requirements than previously generated videos.
Let's examine the changes in prompt modifications, video generation, and video validation results and reasons throughout this process.

Process Changes (Prompt Modifications, Video Generation, Video Validation Results and Reasons)

Let's look at how the process changed with each prompt modification, including the video generation prompts, generated videos, and validation results and reasons.

0th Modification

[Executed Prompt]

自然の中から見た富士山がある無人の夜景で、空には月があってオーロラが動いて流星群が流れており、地上には海が広がって流氷が流れており、水平線からは太陽が登る日の出の映像

[Generated Video]

[Video Validation Result and Reason]
Result: Failed
Reason:

The video does not depict an aurora, nor does it show a sunrise with the sun rising from the horizon. It primarily shows a night view of Mount Fuji with a full moon, stars, and the reflection of city lights on the sea.

1st Modification

[Executed Prompt]

A breathtaking sunrise video featuring Mount Fuji as the main subject, set against a serene, uninhabited nightscape. The environment includes a tranquil sea with drifting ice floes, a majestic aurora dancing in the sky, and a spectacular meteor shower. The lighting transitions from moonlight to the warm hues of dawn as the sun rises on the horizon. The camera captures the scene with a slow, panoramic movement, emphasizing the grandeur and tranquility of the natural setting.

[Generated Video]

[Video Validation Result and Reason]
Result: Failed
Reason:

The video does not depict a sunrise but rather a night scene with aurora and meteor shower, and there is no transition to dawn or sunrise as described in the prompt.

2nd Modification

[Executed Prompt]

A breathtaking video showcasing Mount Fuji at dawn, set against a serene, uninhabited nightscape. The sky is adorned with a radiant moon, shimmering auroras, and shooting stars. Below, the vast ocean stretches out, with drifting icebergs and the horizon glowing as the sun rises, casting a warm, golden light across the scene. The camera captures the majestic view with a smooth, panning motion, highlighting the tranquil beauty and natural splendor of the moment.

[Generated Video]

* This is the same video as shown in "The video that passed verification" above.

[Video Validation Result and Reason]
Result: Passed
Reason:

The video depicts Mount Fuji at dawn with a serene nightscape, featuring a radiant moon, shimmering auroras, and shooting stars. The ocean below has drifting icebergs, and the horizon glows as the sun rises, casting a warm, golden light. The camera's smooth, panning motion effectively captures the tranquil beauty and natural splendor of the scene.

Key Insights from Process Changes (Prompt Modifications, Video Generation, Video Validation Results and Reasons)

From these processes, the following notable points about Amazon Nova Pro and Amazon Nova Reel can be highlighted:

[About Amazon Nova Pro]

In Nova Pro's prompt improvement, quality enhancement was observed through modification from initial Japanese prompts to more detailed and structured English prompts
In Nova Pro's prompt improvement, modified prompts could be enhanced with technical details such as camera work and light expression
In Nova Pro's video validation, multiple elements (Mount Fuji, aurora, meteors, ocean, ice) could be simultaneously recognized and evaluated
In Nova Pro's video validation, movement elements (aurora motion, ice flow) could also be understood and verified
In Nova Pro's video validation, specific reasons could be provided for validation results
In Nova Pro's video validation, it's challenging to verify content requiring judgment of temporal progression like "sunrise" vs "sunset" based on sun position and background colors

[About Amazon Nova Reel]

In Nova Reel's video generation, basic requirements could be understood and videos generated even with Japanese prompts
In Nova Reel's video generation, complex scenes combining multiple natural phenomena could be generated
In Nova Reel's video generation, dynamic scenes with some temporal changes could be expressed
In Nova Reel's video generation, it's challenging to simultaneously depict multiple events with temporal progression like "sunrise" vs "sunset" along with other elements

References:
Tech Blog with curated related content
AWS Documentation(Amazon Bedrock)
AWS Documentation(Amazon Nova)

Summary

In this article, I introduced an example of using Amazon Bedrock to verify and regenerate videos created with Nova Reel using Nova Pro's video understanding and analysis capabilities.
Through this experiment, the following points about Nova Pro were confirmed:

Nova Pro's video recognition capability can recognize not only objects in videos but also content and expressions including movement and temporal changes, which can be used for requirements validation.
Nova Pro can be used for prompt optimization and translation for other image and video generation models.
Nova Pro can be used for verification to check if videos meet requirements.
By automating cycles of video generation, video verification, prompt optimization, and video regeneration, it's possible to reduce manual visual inspection workload.

However, as with previous examples using image generation AI, it's important to carefully craft prompt modification instructions according to each video generation AI's best practices.
This approach can enable effective prompt optimization and video generation automation to be applied to various other video generation AIs.
Also, since Amazon Nova Pro's video verification capabilities aren't perfect, it's important to avoid excessive expectations or delegation and ensure proper final human verification.

Regarding Nova Reel, the following points were confirmed:

Requirements-compliant videos can be generated with Japanese prompts as they can understand the content of descriptions. In other words, even without optimizing Japanese prompts into English prompts that were implemented for video generation in this attempt, there is a high possibility of generating videos that meet requirements by following best practices in Japanese and making several attempts.

Thus, by combining text generation and understanding models (like Nova Pro), image generation models (like Nova Canvas), and video generation models (like Nova Reel), possibilities expand for various processes and automation using multimodal data.
Particularly with the release of higher-level models or new versions of Nova Pro, more advanced multimodal data processing control and automation may become possible.
I look forward to continuing to monitor the evolution of AI models provided by Amazon Bedrock and new implementation methods utilizing them, exploring further expansion of their application areas.

Written by Hidekazu Konishi