hidekazu-konishi.com

Validating and Regenerating Videos Using Amazon Nova Pro Vision Model on Amazon Bedrock (Amazon Nova Reel Edition)

First Published:
Last Updated:

In previous articles, I introduced methods for automating media generation, verification, and regeneration using combinations of Vision understanding models and AI generation models on Amazon Bedrock, such as Amazon Nova Pro with Amazon Nova Canvas, Anthropic Claude with Amazon Titan Image Generator G1, and Anthropic Claude with Stable Diffusion XL.

Using Amazon Nova Pro Vision Capabilities on Amazon Bedrock to Verify, Regenerate, and Automate Image Generation with Amazon Nova Canvas
Using Claude 3.5 Sonnet Vision Capabilities on Amazon Bedrock to Verify, Regenerate, and Automate Image Generation with Amazon Titan Image Generator G1
Using Amazon Bedrock to repeatedly generate images with Stable Diffusion XL via Claude 3.5 Sonnet until requirements are met

In this article, I will explore how to utilize Amazon Nova Pro's video understanding and analysis capabilities on Amazon Bedrock to automate the verification and regeneration of videos created with Amazon Nova Reel.
Similar to the previous articles, this attempt aims to reduce manual visual inspection workload by automatically determining if generated videos meet requirements.

* The source code published in this article and other articles by this author was developed as part of independent research and is provided 'as is' without any warranty of operability or fitness for a particular purpose. Please use it at your own risk. The code may be modified without prior notice.
* This article uses AWS services on an AWS account registered individually for writing.
* The Amazon Bedrock Models used for writing this article were executed on 2025-02-02 (JST) and are based on the following End user license agreement (EULA) at that time:
Amazon Nova Pro (amazon.nova-pro-v1:0): End user license agreement (EULA) (AWS Customer Agreement and Service Terms)
Amazon Nova Reel (amazon.nova-reel-v1:0): End user license agreement (EULA) (AWS Customer Agreement and Service Terms)

Architecture Diagram and Process Flow

The architecture diagram to implement the theme of this article is as follows.
In this case, we are using AWS Step Functions and AWS Lambda to process the operations.
Validating and Regenerating Videos Using Amazon Nova Pro Vision Model on Amazon Bedrock (Amazon Nova Reel Edition)
Validating and Regenerating Videos Using Amazon Nova Pro Vision Model on Amazon Bedrock (Amazon Nova Reel Edition)
Here's a detailed explanation of this process flow:
1. Input an event containing prompts and parameters.

2-1. Execute the Nova Reel model on Amazon Bedrock with the input video creation prompt.

2-2. Save the generated video to Amazon S3.

2-3. Execute the Nova Pro model on Amazon Bedrock to verify if the video stored in Amazon S3 meets the requirements specified in the video creation prompt.
   * If determined not to meet the requirements of the video creation prompt, repeat steps `2-1.` to `2-3.` for the specified number of same prompt executions.
   * If determined to meet the requirements of the video creation prompt, use that video as the output result.

3. If the revision prompt execution count has not been exceeded and the number of times determined not to meet the video creation prompt requirements exceeds the same prompt execution count, execute the Nova Pro model on Amazon Bedrock to modify the video creation prompt to one with a higher likelihood of meeting requirements. Restart processing from `2-1.` with this new video creation prompt.
   * If the revision prompt execution count is exceeded, end processing with an error.
The key point in this process flow is the modification of the video creation prompt by the Nova Pro model.
If the video creation prompt is easy for AI to understand, there's a high probability that a video meeting the requirements will be generated after several attempts.
However, if the video creation prompt is difficult for AI to understand, videos meeting the requirements may not be generated.
Therefore, when the specified number of same prompt executions is exceeded, I included a process to execute Nova Pro on Amazon Bedrock to optimize and modify the video creation prompt.

Implementation Example

AWS CloudFormation Template

AWSTemplateFormatVersion: '2010-09-09'
Description: 'amazon-bedrock-nova-pro-vision-automate-nova-reel-video-gen-draft: Video Generation and Validation Workflow with Step Functions and Lambda'

Parameters:
  LambdaLayerArn:
    Type: String
    Default: 'arn:aws:lambda:us-east-1:123456789012:layer:my-layer:1'
    Description: ARN of the Lambda Layer to be used by all functions

Resources:
  # IAM Role for Lambda functions
  LambdaExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
      Policies:
        - PolicyName: BedrockAccess
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - bedrock:InvokeModel
                  - bedrock:StartAsyncInvoke
                  - bedrock:InvokeModelWithResponseStream
                  - bedrock:CreateModelInvocationJob
                  - bedrock:GetAsyncInvoke
                  - bedrock:GetModelInvoke
                Resource: '*'
        - PolicyName: S3Access
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - s3:PutObject
                  - s3:GetObject
                  - s3:ListBucket
                  - s3:PutObjectAcl
                  - s3:GetObjectAcl
                  - s3:AbortMultipartUpload
                  - s3:ListMultipartUploadParts
                Resource: '*'

  # IAM Role for Step Functions
  StepFunctionsExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: states.amazonaws.com
            Action: sts:AssumeRole
      Policies:
        - PolicyName: LambdaInvoke
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - lambda:InvokeFunction
                Resource:
                  - !GetAtt InitializeProcessFunction.Arn
                  - !GetAtt GenerateVideoFunction.Arn
                  - !GetAtt CheckVideoStatusFunction.Arn
                  - !GetAtt ValidateVideoFunction.Arn
                  - !GetAtt RevisePromptFunction.Arn

  # Lambda Functions
  InitializeProcessFunction:
    Type: AWS::Lambda::Function
    Properties:
      Handler: index.lambda_handler
      Role: !GetAtt LambdaExecutionRole.Arn
      Layers:
        - !Ref LambdaLayerArn
      Code:
        ZipFile: |
          import json
          import os

          def lambda_handler(event, context):
              try:
                  start_time = event.get('execution_start_time')
                  timestamp = start_time.replace('T', '_').replace('-', '').replace(':', '').split('.')[0]

                  initial_prompt = event['prompt']
                  max_retry_attempts = max(0, event.get('max_retry_attempts', 5) - 1)
                  max_prompt_revisions = max(0, event.get('max_prompt_revisions', 5) - 1)
                  output_s3_bucket_name = event['output_s3_bucket_name']
                  output_s3_key_prefix = event.get('output_s3_key_prefix', 'generated-videos')

                  return {
                      'timestamp': timestamp,
                      'initial_prompt': initial_prompt,
                      'prompt': initial_prompt,
                      'max_retry_attempts': max_retry_attempts,
                      'max_prompt_revisions': max_prompt_revisions,
                      'output_s3_bucket_name': output_s3_bucket_name,
                      'output_s3_key_prefix': output_s3_key_prefix,
                      'current_revision': 0,
                      'current_attempt': 0,
                      'nova_pro_validate_params': {
                          'temperature': event.get('nova_pro_validate_temperature', 0.7),
                          'top_p': event.get('nova_pro_validate_top_p', 0.9),
                          'top_k': event.get('nova_pro_validate_top_k', 50),
                          'max_tokens': event.get('nova_pro_validate_max_tokens', 5120)
                      },
                      'nova_pro_revise_params': {
                          'temperature': event.get('nova_pro_revise_temperature', 0.7),
                          'top_p': event.get('nova_pro_revise_top_p', 0.9),
                          'top_k': event.get('nova_pro_revise_top_k', 50),
                          'max_tokens': event.get('nova_pro_revise_max_tokens', 5120)
                      },
                      'nova_reel_params': {
                          'duration_seconds': event.get('nova_reel_duration_seconds', 6),
                          'fps': event.get('nova_reel_fps', 24),
                          'dimension': event.get('nova_reel_dimension', "1280x720"),
                          'seed': event.get('nova_reel_seed', 0)
                      }
                  }
              except Exception as e:
                  print(f"Error in InitializeProcess: {str(e)}")
                  raise e
      Runtime: python3.12
      Timeout: 900
      MemorySize: 10240
      Environment:
        Variables:
          AWS_ACCOUNT_ID: !Ref 'AWS::AccountId'

  GenerateVideoFunction:
    Type: AWS::Lambda::Function
    Properties:
      Handler: index.lambda_handler
      Role: !GetAtt LambdaExecutionRole.Arn
      Layers:
        - !Ref LambdaLayerArn
      Code:
        ZipFile: |
          import boto3
          import json
          import os
          import random
          import datetime

          bedrock_runtime_client = boto3.client('bedrock-runtime')

          def truncate_to_512(text):
              if len(text) <= 512:
                  return text
              
              truncated = text[:512]
              last_period = truncated.rfind('.')
              last_comma = truncated.rfind(',')
              last_break = max(last_period, last_comma)
              
              if last_break > 256:
                  return truncated[:last_break + 1]
              else:
                  return truncated

          def nova_reel_invoke_model(prompt, timestamp, model_params):
              try:
                  seed = model_params.get('seed', 0)
                  if seed == 0:
                      seed = random.randint(0, 858993459)
                  
                  optimized_prompt = truncate_to_512(prompt)

                  model_input = {
                      "taskType": "TEXT_VIDEO",
                      "textToVideoParams": {
                          "text": optimized_prompt
                      },
                      "videoGenerationConfig": {
                          "durationSeconds": model_params.get('duration_seconds', 6),
                          "fps": model_params.get('fps', 24),
                          "dimension": model_params.get('dimension', "1280x720"),
                          "seed": seed
                      }
                  }
                  
                  print(f"Nova Reel model parameters: {model_input}")
                  
                  invocation = bedrock_runtime_client.start_async_invoke(
                      modelId="amazon.nova-reel-v1:0",
                      modelInput=model_input,
                      outputDataConfig={
                          "s3OutputDataConfig": {
                              "s3Uri": f"s3://{model_params['output_s3_bucket_name']}/{model_params['output_s3_key_prefix']}/{timestamp}/"
                          }
                      }
                  )
                  
                  print(f"start_async_invoke invocation: {invocation}")

                  return invocation['invocationArn']
              except Exception as e:
                  print(f"Error in nova_reel_invoke_model: {str(e)}")
                  raise e

          def lambda_handler(event, context):
              try:
                  timestamp = event['timestamp']

                  prompt = event['prompt']
                  nova_reel_params = event['nova_reel_params']
                  nova_reel_params['output_s3_bucket_name'] = event['output_s3_bucket_name']
                  nova_reel_params['output_s3_key_prefix'] = event['output_s3_key_prefix']

                  invocation_arn = nova_reel_invoke_model(prompt, timestamp, nova_reel_params)
                  
                  return {
                      **event,
                      'invocation_arn': invocation_arn,
                      'timestamp': timestamp
                  }
              except Exception as e:
                  print(f"Error in GenerateVideo: {str(e)}")
                  raise e
      Runtime: python3.12
      Timeout: 900
      MemorySize: 10240
      Environment:
        Variables:
          AWS_ACCOUNT_ID: !Ref 'AWS::AccountId'

  CheckVideoStatusFunction:
    Type: AWS::Lambda::Function
    Properties:
      Handler: index.lambda_handler
      Role: !GetAtt LambdaExecutionRole.Arn
      Layers:
        - !Ref LambdaLayerArn
      Code:
        ZipFile: |
          import boto3
          import json

          bedrock_runtime_client = boto3.client('bedrock-runtime')

          def lambda_handler(event, context):
              try:
                  invocation_arn = event['invocation_arn']
                  timestamp = event['timestamp']
                  s3_prefix = invocation_arn.split('/')[-1]
                  video_s3_uri = f"s3://{event['output_s3_bucket_name']}/{event['output_s3_key_prefix']}/{timestamp}/{s3_prefix}/output.mp4"

                  response = bedrock_runtime_client.get_async_invoke(
                      invocationArn=invocation_arn
                  )
                  
                  status = response['status']
                  
                  if status == 'Completed':
                      return {
                          **event,
                          'video_generated': True,
                          'video_s3_uri': video_s3_uri
                      }
                  elif status in ['InProgress', 'Failed']:
                      return {
                          **event,
                          'video_generated': False,
                          'error': f"Video generation {status.lower()}"
                      }
                  else:
                      return {
                          **event,
                          'video_generated': False
                      }
              except Exception as e:
                  print(f"Error in CheckVideoStatus: {str(e)}")
                  raise e
      Runtime: python3.12
      Timeout: 900
      MemorySize: 10240
      Environment:
        Variables:
          AWS_ACCOUNT_ID: !Ref 'AWS::AccountId'

  ValidateVideoFunction:
    Type: AWS::Lambda::Function
    Properties:
      Handler: index.lambda_handler
      Role: !GetAtt LambdaExecutionRole.Arn
      Layers:
        - !Ref LambdaLayerArn
      Code:
        ZipFile: |
          import boto3
          import json
          import os

          bedrock_runtime_client = boto3.client('bedrock-runtime')

          def nova_pro_invoke_model(input_prompt, video_media_format, video_s3_uri, model_params):
              try:
                  messages = [
                      {
                          "role": "user",
                          "content": [
                              {
                                  "video": {
                                      "format": video_media_format,
                                      "source": {
                                          "s3Location": {
                                              "uri": video_s3_uri,
                                              "bucketOwner": os.environ['AWS_ACCOUNT_ID']
                                          }
                                      }
                                  }
                              },
                              {
                                  "text": input_prompt
                              }
                          ]
                      }
                  ]

                  body = {
                      "messages": messages,
                      "inferenceConfig": {
                          "max_new_tokens": model_params.get('max_tokens', 5120),
                          "temperature": model_params.get('temperature', 0.7),
                          "top_p": model_params.get('top_p', 0.9),
                          "top_k": model_params.get('top_k', 50)
                      }
                  }

                  response = bedrock_runtime_client.invoke_model(
                      modelId='amazon.nova-pro-v1:0',
                      contentType='application/json',
                      accept='application/json',
                      body=json.dumps(body)
                  )

                  response_body = json.loads(response.get('body').read())
                  response_text = response_body["output"]["message"]["content"][0]['text']
                  return response_text
              except Exception as e:
                  print(f"Error in nova_pro_invoke_model: {str(e)}")
                  raise e

          def validate_video(video_s3_uri, prompt, nova_pro_validate_params):
              try:
                  input_prompt = f"""Does this video match the following prompt? Prompt: {prompt}. 
                  Please answer in the following JSON format:
                  {{"result":"<YES or NO>", "reason":"<Reason for your decision>"}}
                  Ensure your response can be parsed as valid JSON. Do not include any explanations, comments, or additional text outside of the JSON structure."""

                  validation_result = nova_pro_invoke_model(input_prompt, "mp4", video_s3_uri, nova_pro_validate_params)
                  
                  print(f"validation Result: {validation_result}")
                  parsed_result = json.loads(validation_result)
                  is_valid = parsed_result['result'].upper() == 'YES'
                  print(f"Video validation result: {is_valid}")
                  print(f"Validation reason: {parsed_result['reason']}")
                  return is_valid, parsed_result['reason']
              except json.JSONDecodeError:
                  print(f"Error parsing validation result: {validation_result}")
                  return False, "Error parsing validation result"
              except Exception as e:
                  print(f"Error in validate_video: {str(e)}")
                  return False, f"Error during validation: {str(e)}"

          def lambda_handler(event, context):
              try:
                  video_s3_uri = event['video_s3_uri']
                  prompt = event['prompt']
                  nova_pro_validate_params = event['nova_pro_validate_params']
                  
                  is_valid, reason = validate_video(video_s3_uri, prompt, nova_pro_validate_params)
                  
                  return {
                      **event,
                      'is_valid': is_valid,
                      'validation_reason': reason
                  }
              except Exception as e:
                  print(f"Error in ValidateVideo: {str(e)}")
                  raise e
      Runtime: python3.12
      Timeout: 900
      MemorySize: 10240
      Environment:
        Variables:
          AWS_ACCOUNT_ID: !Ref 'AWS::AccountId'

  RevisePromptFunction:
    Type: AWS::Lambda::Function
    Properties:
      Handler: index.lambda_handler
      Role: !GetAtt LambdaExecutionRole.Arn
      Layers:
        - !Ref LambdaLayerArn
      Code:
        ZipFile: |
          import boto3
          import json

          bedrock_runtime_client = boto3.client('bedrock-runtime')

          def nova_pro_invoke_model(input_prompt, model_params):
              try:
                  messages = [
                      {
                          "role": "user",
                          "content": [
                              {
                                  "text": input_prompt
                              }
                          ]
                      }
                  ]

                  body = {
                      "messages": messages,
                      "inferenceConfig": {
                          "max_new_tokens": model_params.get('max_tokens', 5120),
                          "temperature": model_params.get('temperature', 0.7),
                          "top_p": model_params.get('top_p', 0.9),
                          "top_k": model_params.get('top_k', 50)
                      }
                  }

                  response = bedrock_runtime_client.invoke_model(
                      modelId='amazon.nova-pro-v1:0',
                      contentType='application/json',
                      accept='application/json',
                      body=json.dumps(body)
                  )

                  response_body = json.loads(response.get('body').read())
                  response_text = response_body["output"]["message"]["content"][0]['text']
                  return response_text
              except Exception as e:
                  print(f"Error in nova_pro_invoke_model: {str(e)}")
                  raise e

          def revise_prompt(original_prompt, nova_pro_revise_params):
              try:
                  input_prompt = f"""Revise the following Video Generation prompt to optimize it for Amazon Nova Reel, incorporating best practices:

                  {original_prompt}

                  Please consider the following guidelines in your revision:
                  1. Frame the prompt as a video description rather than a command, focusing on describing what's in the video.
                  2. Include clear descriptions of:
                     - The main subject
                     - The environment/setting
                     - Movement and actions of subjects
                     - Lighting conditions (if relevant)
                     - Camera movement (if relevant)
                     - Visual style or medium (if relevant)
                  3. Avoid using negation words (no, not, without, etc.) in the main prompt.
                  4. Place most important elements at the beginning of the prompt.
                  5. Separate different elements with clear, descriptive phrases.
                  6. Use specific, descriptive adjectives to convey the desired mood and style.
                  7. If the original prompt is not in English, translate it to English.

                  Your goal is to create a clear, detailed prompt that will result in a high-quality Video Generation with Nova Reel, while staying within the 512-character limit.
                  
                  Please provide your response in the following JSON format:
                  {{"revised_prompt":"<Revised Prompt>"}}
                  Ensure your response can be parsed as valid JSON. Do not include any explanations, comments, or additional text outside of the JSON structure."""

                  revised_prompt_json = nova_pro_invoke_model(input_prompt, nova_pro_revise_params)
                  print(f"Original prompt: {original_prompt}")
                  print(f"Revised prompt JSON: {revised_prompt_json.strip()}")
                  
                  parsed_result = json.loads(revised_prompt_json)
                  revised_prompt = parsed_result['revised_prompt']
                  print(f"Parsed revised prompt: {revised_prompt}")
                  return revised_prompt
              except json.JSONDecodeError:
                  print(f"Error parsing revised prompt result: {revised_prompt_json}")
                  return original_prompt
              except Exception as e:
                  print(f"Error in revise_prompt: {str(e)}")
                  return original_prompt

          def lambda_handler(event, context):
              try:
                  original_prompt = event['initial_prompt']
                  nova_pro_revise_params = event['nova_pro_revise_params']
                  
                  revised_prompt = revise_prompt(original_prompt, nova_pro_revise_params)
                  
                  return {
                      **event,
                      'prompt': revised_prompt,
                      'current_revision': event['current_revision'] + 1,
                      'current_attempt': 0
                  }
              except Exception as e:
                  print(f"Error in RevisePrompt: {str(e)}")
                  raise e
      Runtime: python3.12
      Timeout: 900
      MemorySize: 10240
      Environment:
        Variables:
          AWS_ACCOUNT_ID: !Ref 'AWS::AccountId'

  # Step Functions State Machine
  VideoGenerationStateMachine:
    Type: AWS::StepFunctions::StateMachine
    Properties:
      RoleArn: !GetAtt StepFunctionsExecutionRole.Arn
      DefinitionString: !Sub
        - |-
          {
            "Comment": "Video Generation and Validation Workflow",
            "StartAt": "InitializeProcess",
            "States": {
              "InitializeProcess": {
                "Type": "Task",
                "Resource": "${InitializeProcessFunction}",
                "Parameters": {
                  "execution_start_time.$": "$$.Execution.StartTime",
                  "prompt.$": "$.prompt",
                  "max_retry_attempts.$": "$.max_retry_attempts",
                  "max_prompt_revisions.$": "$.max_prompt_revisions",
                  "output_s3_bucket_name.$": "$.output_s3_bucket_name",
                  "output_s3_key_prefix.$": "$.output_s3_key_prefix",
                  "nova_pro_validate_temperature.$": "$.nova_pro_validate_temperature",
                  "nova_pro_validate_top_p.$": "$.nova_pro_validate_top_p",
                  "nova_pro_validate_top_k.$": "$.nova_pro_validate_top_k",
                  "nova_pro_validate_max_tokens.$": "$.nova_pro_validate_max_tokens",
                  "nova_pro_revise_temperature.$": "$.nova_pro_revise_temperature",
                  "nova_pro_revise_top_p.$": "$.nova_pro_revise_top_p",
                  "nova_pro_revise_top_k.$": "$.nova_pro_revise_top_k",
                  "nova_pro_revise_max_tokens.$": "$.nova_pro_revise_max_tokens",
                  "nova_reel_duration_seconds.$": "$.nova_reel_duration_seconds",
                  "nova_reel_fps.$": "$.nova_reel_fps",
                  "nova_reel_dimension.$": "$.nova_reel_dimension",
                  "nova_reel_seed.$": "$.nova_reel_seed"
                },
                "Next": "GenerateVideo",
                "Catch": [{
                  "ErrorEquals": ["States.ALL"],
                  "Next": "FailState"
                }]
              },
              "GenerateVideo": {
                "Type": "Task",
                "Resource": "${GenerateVideoFunction}",
                "Next": "CheckVideoStatus",
                "Catch": [{
                  "ErrorEquals": ["States.ALL"],
                  "Next": "FailState"
                }]
              },
              "CheckVideoStatus": {
                "Type": "Task",
                "Resource": "${CheckVideoStatusFunction}",
                "Next": "Wait",
                "Catch": [{
                  "ErrorEquals": ["States.ALL"],
                  "Next": "FailState"
                }]
              },
              "Wait": {
                "Type": "Wait",
                "Seconds": 10,
                "Next": "IsVideoGenerated"
              },
              "IsVideoGenerated": {
                "Type": "Choice",
                "Choices": [
                  {
                    "Variable": "$.video_generated",
                    "BooleanEquals": true,
                    "Next": "ValidateVideo"
                  },
                  {
                    "Variable": "$.video_generated",
                    "BooleanEquals": false,
                    "Next": "CheckVideoStatus"
                  }
                ]
              },
              "ValidateVideo": {
                "Type": "Task",
                "Resource": "${ValidateVideoFunction}",
                "Next": "IsVideoValid",
                "Catch": [{
                  "ErrorEquals": ["States.ALL"],
                  "Next": "FailState"
                }]
              },
              "IsVideoValid": {
                "Type": "Choice",
                "Choices": [
                  {
                    "Variable": "$.is_valid",
                    "BooleanEquals": true,
                    "Next": "SuccessState"
                  },
                  {
                    "Variable": "$.is_valid",
                    "BooleanEquals": false,
                    "Next": "CheckRevisionLimit"
                  }
                ]
              },
              "CheckRevisionLimit": {
                "Type": "Choice",
                "Choices": [
                  {
                    "Variable": "$.current_revision",
                    "NumericLessThanPath": "$.max_prompt_revisions",
                    "Next": "RevisePrompt"
                  }
                ],
                "Default": "FailState"
              },
              "RevisePrompt": {
                "Type": "Task",
                "Resource": "${RevisePromptFunction}",
                "Next": "GenerateVideo",
                "Catch": [{
                  "ErrorEquals": ["States.ALL"],
                  "Next": "FailState"
                }]
              },
              "SuccessState": {
                "Type": "Succeed"
              },
              "FailState": {
                "Type": "Fail",
                "Error": "GenerationError",
                "Cause": "Video generation process failed"
              }
            }
          }
        - InitializeProcessFunction: !GetAtt InitializeProcessFunction.Arn
          GenerateVideoFunction: !GetAtt GenerateVideoFunction.Arn
          CheckVideoStatusFunction: !GetAtt CheckVideoStatusFunction.Arn
          ValidateVideoFunction: !GetAtt ValidateVideoFunction.Arn
          RevisePromptFunction: !GetAtt RevisePromptFunction.Arn

Outputs:
  StateMachineArn:
    Description: ARN of the Step Functions state machine
    Value: !Ref VideoGenerationStateMachine
  InitializeProcessFunctionArn:
    Description: ARN of the Initialize Process Lambda function
    Value: !GetAtt InitializeProcessFunction.Arn
  GenerateVideoFunctionArn:
    Description: ARN of the Generate Video Lambda function
    Value: !GetAtt GenerateVideoFunction.Arn
  CheckVideoStatusFunctionArn:
    Description: ARN of the Check Video Status Lambda function
    Value: !GetAtt CheckVideoStatusFunction.Arn
  ValidateVideoFunctionArn:
    Description: ARN of the Validate Video Lambda function
    Value: !GetAtt ValidateVideoFunction.Arn
  RevisePromptFunctionArn:
    Description: ARN of the Revise Prompt Lambda function
    Value: !GetAtt RevisePromptFunction.Arn

AWS CloudFormation Template Explanation

This AWS CloudFormation template deploys the following AWS resources:

Lambda Functions

  • InitializeProcessFunction
    • Parameter initialization at Step Functions execution start
    • Timestamp generation and prompt initial setup
    • Nova Reel and Nova Pro parameter settings
    • Setting limits for retry count and prompt revision count
    • S3 bucket information setup and validation
  • GenerateVideoFunction
    • Generate video using Nova Reel
    • Optimize prompt within 512 characters
    • Set video generation parameters (FPS, resolution, duration)
    • Configure Amazon S3 storage destination
      python s3Uri = f"s3://{bucket_name}/{key_prefix}/{timestamp}/"
    • Obtain and return invocation_arn for asynchronous execution
  • CheckVideoStatusFunction
    • Check status of Amazon Bedrock asynchronous execution
    • Save generated video to S3 URI
      python video_s3_uri = f"s3://{bucket_name}/{prefix}/{timestamp}/{s3_prefix}/output.mp4"
    • Check and return processing status (Completed/InProgress/Failed)
    • Error handling
  • ValidateVideoFunction
    • Validate generated video using Nova Pro
    • Retrieve video from S3 and input to Nova Pro
    • Generate and execute validation prompt
    • Return validation results (success/failure) and reasons in JSON format
    • Appropriate error handling for exceptions
  • RevisePromptFunction
    • Revise prompt using Nova Pro
    • Apply prompt optimization guidelines:
      • Clear description of main subject
      • Environment/setting details
      • Movement and action descriptions
      • Lighting conditions
      • Camera movement
      • Visual style
    • Return revised prompt in JSON format
    • Optimization within 512 character limit

IAM Roles

  • LambdaExecutionRole
    • Access permissions for Bedrock service
    • Read and write permissions for S3 bucket
    • Permission to write logs to CloudWatch Logs
  • StepFunctionsExecutionRole
    • Execution permissions for all Lambda functions
    • State machine execution management permissions

Step Functions State Machine

  • Control of workflow sequence from initialization to success/failure
  • Regular checking of video generation status (10-second intervals)
  • Transition to prompt revision flow when validation fails
  • Management of maximum retry count and prompt revision count
  • Error handling and transition to appropriate end states

Execution Details and Results

From this point forward, I would like to analyze a case study based on an actual implementation of this system.
First, we will start with the input parameters and their significance.

Input Parameters and Their Configuration

{
    "prompt": "自然の中から見た富士山がある無人の夜景で、空には月があってオーロラが動いて流星群が流れており、地上には海が広がって流氷が流れており、水平線からは太陽が登る日の出の映像", # Initial prompt for video generation
    "output_s3_bucket_name": "ho2k.com", # Name of S3 bucket to store generated videos
    "output_s3_key_prefix": "generated-videos", # S3 key prefix for generated videos
    "max_retry_attempts": 1, # Maximum number of video generation attempts per prompt
    "max_prompt_revisions": 5, # Maximum number of prompt revisions
    "nova_pro_validate_temperature": 0.3, # Nova Pro model temperature for video validation
    "nova_pro_validate_top_p": 0.9, # Nova Pro model top-p for video validation
    "nova_pro_validate_top_k": 40, # Nova Pro model top-k for video validation
    "nova_pro_validate_max_tokens": 5120, # Nova Pro model maximum tokens for video validation
    "nova_pro_revise_temperature": 0.7, # Nova Pro model temperature for prompt revision
    "nova_pro_revise_top_p": 0.9, # Nova Pro model top-p for prompt revision
    "nova_pro_revise_top_k": 50, # Nova Pro model top-k for prompt revision
    "nova_pro_revise_max_tokens": 5120, # Nova Pro model maximum tokens for prompt revision
    "nova_reel_duration_seconds": 6, # Generated video duration (seconds)
    "nova_reel_fps": 24, # Generated video frame rate
    "nova_reel_dimension": "1280x720", # Generated video resolution
    "nova_reel_seed": 0 # Seed value for video generation reproducibility (random if 0)
}
In configuring these parameters, the following considerations were made:
  • Set max_retry_attempts to 1 to improve the prompt at an early stage, after the first retry (second execution).
  • Set max_prompt_revisions to 5 to increase opportunities for prompt improvement.
  • Finely tuned Nova Pro model parameters (temperature, top_p, top_k, max_tokens) for video validation and revision.
  • Specifically set lower values for nova_pro_validate_temperature and nova_pro_validate_top_k to enable more precise validation.
  • Set nova_reel_duration_seconds to 6 to generate videos of appropriate length.
  • Set nova_reel_fps to 24 to generate videos with smooth movement.
  • Set video generation seed to be random to ensure different videos are generated each time.

An Example of Execution: Results

Next, let's examine an example of the execution results using these parameters.

The video that passed verification

Here is the video that ultimately passed validation by meeting the prompt requirements.
This video largely expresses the elements mentioned in the original prompt: "自然の中から見た富士山がある無人の夜景で、空には月があってオーロラが動いて流星群が流れており、地上には海が広がって流氷が流れており、水平線からは太陽が登る日の出の映像"(The meaning is "A video of an uninhabited night scene of Mount Fuji seen from nature, with the moon in the sky, moving aurora, meteor shower, ocean spreading below with drifting sea ice, and the sun rising from the horizon at dawn.")


The following elements were appropriately expressed:
  • Uninhabited night scene of Mount Fuji seen from nature
  • Moon in the sky with moving aurora
  • Two meteors appearing in some scenes
  • Ocean spreading below with drifting sea ice
  • Sun visible at the horizon in some scenes
On the other hand, the following elements were not appropriately expressed:
  • Not enough meteors to be called a meteor shower
  • Shows a sunset rather than a sunrise
While there were both well-expressed and poorly-expressed elements, Amazon Nova Pro Vision's video validation determined that it satisfied the initial prompt requirements.
It was confirmed that this final video met more of the initial prompt requirements than previously generated videos.
Let's examine the changes in prompt modifications, video generation, and video validation results and reasons throughout this process.

Process Changes (Prompt Modifications, Video Generation, Video Validation Results and Reasons)

Let's look at how the process changed with each prompt modification, including the video generation prompts, generated videos, and validation results and reasons.
0th Modification
[Executed Prompt]
自然の中から見た富士山がある無人の夜景で、空には月があってオーロラが動いて流星群が流れており、地上には海が広がって流氷が流れており、水平線からは太陽が登る日の出の映像
[Generated Video]

[Video Validation Result and Reason]
Result: Failed
Reason: The video does not depict an aurora, nor does it show a sunrise with the sun rising from the horizon. It primarily shows a night view of Mount Fuji with a full moon, stars, and the reflection of city lights on the sea.
1st Modification
[Executed Prompt]
A breathtaking sunrise video featuring Mount Fuji as the main subject, set against a serene, uninhabited nightscape. The environment includes a tranquil sea with drifting ice floes, a majestic aurora dancing in the sky, and a spectacular meteor shower. The lighting transitions from moonlight to the warm hues of dawn as the sun rises on the horizon. The camera captures the scene with a slow, panoramic movement, emphasizing the grandeur and tranquility of the natural setting.
[Generated Video]

[Video Validation Result and Reason]
Result: Failed
Reason: The video does not depict a sunrise but rather a night scene with aurora and meteor shower, and there is no transition to dawn or sunrise as described in the prompt.
2nd Modification
[Executed Prompt]
A breathtaking video showcasing Mount Fuji at dawn, set against a serene, uninhabited nightscape. The sky is adorned with a radiant moon, shimmering auroras, and shooting stars. Below, the vast ocean stretches out, with drifting icebergs and the horizon glowing as the sun rises, casting a warm, golden light across the scene. The camera captures the majestic view with a smooth, panning motion, highlighting the tranquil beauty and natural splendor of the moment.
[Generated Video]

* This is the same video as shown in "The video that passed verification" above.

[Video Validation Result and Reason]
Result: Passed
Reason: The video depicts Mount Fuji at dawn with a serene nightscape, featuring a radiant moon, shimmering auroras, and shooting stars. The ocean below has drifting icebergs, and the horizon glows as the sun rises, casting a warm, golden light. The camera's smooth, panning motion effectively captures the tranquil beauty and natural splendor of the scene.
Key Insights from Process Changes (Prompt Modifications, Video Generation, Video Validation Results and Reasons)
From these processes, the following notable points about Amazon Nova Pro and Amazon Nova Reel can be highlighted:

[About Amazon Nova Pro]
  • In Nova Pro's prompt improvement, quality enhancement was observed through modification from initial Japanese prompts to more detailed and structured English prompts
  • In Nova Pro's prompt improvement, modified prompts could be enhanced with technical details such as camera work and light expression
  • In Nova Pro's video validation, multiple elements (Mount Fuji, aurora, meteors, ocean, ice) could be simultaneously recognized and evaluated
  • In Nova Pro's video validation, movement elements (aurora motion, ice flow) could also be understood and verified
  • In Nova Pro's video validation, specific reasons could be provided for validation results
  • In Nova Pro's video validation, it's challenging to verify content requiring judgment of temporal progression like "sunrise" vs "sunset" based on sun position and background colors
[About Amazon Nova Reel]
  • In Nova Reel's video generation, basic requirements could be understood and videos generated even with Japanese prompts
  • In Nova Reel's video generation, complex scenes combining multiple natural phenomena could be generated
  • In Nova Reel's video generation, dynamic scenes with some temporal changes could be expressed
  • In Nova Reel's video generation, it's challenging to simultaneously depict multiple events with temporal progression like "sunrise" vs "sunset" along with other elements

References:
Tech Blog with curated related content
AWS Documentation(Amazon Bedrock)
AWS Documentation(Amazon Nova)

Summary

In this article, I introduced an example of using Amazon Bedrock to verify and regenerate videos created with Nova Reel using Nova Pro's video understanding and analysis capabilities.
Through this experiment, the following points about Nova Pro were confirmed:
  • Nova Pro's video recognition capability can recognize not only objects in videos but also content and expressions including movement and temporal changes, which can be used for requirements validation.
  • Nova Pro can be used for prompt optimization and translation for other image and video generation models.
  • Nova Pro can be used for verification to check if videos meet requirements.
  • By automating cycles of video generation, video verification, prompt optimization, and video regeneration, it's possible to reduce manual visual inspection workload.
However, as with previous examples using image generation AI, it's important to carefully craft prompt modification instructions according to each video generation AI's best practices.
This approach can enable effective prompt optimization and video generation automation to be applied to various other video generation AIs.
Also, since Amazon Nova Pro's video verification capabilities aren't perfect, it's important to avoid excessive expectations or delegation and ensure proper final human verification.

Regarding Nova Reel, the following points were confirmed:
  • Requirements-compliant videos can be generated with Japanese prompts as they can understand the content of descriptions. In other words, even without optimizing Japanese prompts into English prompts that were implemented for video generation in this attempt, there is a high possibility of generating videos that meet requirements by following best practices in Japanese and making several attempts.
Thus, by combining text generation and understanding models (like Nova Pro), image generation models (like Nova Canvas), and video generation models (like Nova Reel), possibilities expand for various processes and automation using multimodal data.
Particularly with the release of higher-level models or new versions of Nova Pro, more advanced multimodal data processing control and automation may become possible.
I look forward to continuing to monitor the evolution of AI models provided by Amazon Bedrock and new implementation methods utilizing them, exploring further expansion of their application areas.

Written by Hidekazu Konishi


Copyright © Hidekazu Konishi ( hidekazu-konishi.com ) All Rights Reserved.