• Picasso AI Logo
    Logo Picasso IA
  • Home
  • AI Image
    Nano Banana 2
  • AI Video
    Veo 3.1 Lite
  • AI Chat
    Gemini 3 Pro
  • Edit Images
  • Upscale Image
  • Remove Background
  • Text to Speech
  • Effects
    NEW
  • Generations
  • Billing
  • Support
  • Account
  1. Collection
  2. Text to Video
  3. Hailuo 2.3

Turn Text into Cinematic Video with Hailuo 2.3

Hailuo 2.3 is a text-to-video model built for creators who need footage that looks filmed, not generated. It takes a written scene description or a reference image and returns a short video clip with realistic human motion, expressive facial detail, and consistent visual style throughout the shot. For anyone who has struggled to visualize a scene before committing to production, or who simply needs social-ready video without a camera or crew, it solves that gap in minutes. The model works in two modes. Provide a text prompt and it generates the entire scene from scratch, following your style and motion cues closely. Upload a reference image as the first frame and the model uses it as an anchor, building the rest of the clip around that composition and aspect ratio. Output options run up to 1080p resolution for a crisp, ready-to-use video file, with clip lengths of either 6 or 10 seconds depending on your resolution choice. In a real workflow, Hailuo 2.3 sits between the concept stage and the edit. Use it to produce B-roll that matches a specific visual direction, prototype a client scene before committing to a shoot, or turn a mood board image into a moving reference. Set your resolution, write your prompt, and get a finished clip without waiting on render farms or post-production timelines.

Official

Minimax

10.6k runs

Hailuo 2.3

2025-10-26

Commercial Use

Table of contents

  • Overview
  • How It Works
  • Frequently Asked Questions
  • Credit Cost
  • Features
  • Use Cases
  • Examples
Get Nano Banana Pro

Overview

Hailuo 2.3 is a text-to-video model built for creators who need footage that looks filmed, not generated. It converts a written scene description or a reference image into a short, high-fidelity video clip with realistic human motion, expressive character faces, and cinematic visual consistency. On Picasso IA, you access it directly in the browser with no setup required. Whether you are prototyping a client scene, producing B-roll, or turning a concept image into a moving reference, Hailuo 2.3 returns a finished clip in minutes from a single input.

How It Works

  • Write a text description of the scene you want, or upload a photo or illustration to set the opening frame of the video.
  • Set your resolution: 768p for standard quality, or 1080p for full HD output on 6-second clips.
  • Choose a clip duration of 6 seconds or 10 seconds. Note that 10-second clips are only available at 768p.
  • Turn the prompt optimizer on if you want the model to refine and interpret your description automatically before generating.
  • Submit the job and download the finished video clip when the generation finishes.

Frequently Asked Questions

Do I need programming skills or technical knowledge to use this? No, just open Hailuo 2.3 on Picasso IA, adjust the settings you want, and hit generate.

Is it free to try? Yes, you can run Hailuo 2.3 without a paid subscription to get started. Check the credits page for details on how many generations are included at each tier.

How long does it take to get results? Most clips finish generating within a couple of minutes. Longer clips and 1080p jobs may take slightly more time depending on current server load.

What output formats are supported? You receive a standard video file compatible with common editing tools, social platforms, and presentation software. No extra conversion is needed after download.

Can I customize the style of the output? Yes. Include specific details in your prompt about lighting, camera movement, character appearance, and mood. The more descriptive your input, the closer the result matches your intent.

What happens if I am not happy with the result? Rewrite your prompt with more specifics about motion, framing, or the look you want, and run it again. You can also toggle the prompt optimizer to see how different interpretations affect the clip.

Can I anchor the video to a specific image? Yes. Upload an image as the first frame and the model generates the rest of the clip to match that visual starting point, including aspect ratio and overall composition.

Credit Cost

Each generation consumes 11 credits

11 credits

or 55 credits for 5 generations

Features

Everything this model can do for you

Realistic motion

Renders natural human movement and expressive facial detail that holds up across the full clip.

Dual workflow support

Accepts either a text prompt or a reference image as the starting point for every generation.

1080p output

Delivers full HD clips at 1920x1080 for footage ready to drop into a professional edit.

Flexible duration

Choose between 6-second and 10-second clips depending on your resolution and pacing needs.

Prompt optimizer

Automatically refines your input to extract better motion, framing, and visual consistency.

Style adherence

Follows cinematic style cues in your prompt closely, keeping tone and visual treatment consistent throughout.

Image-anchored generation

Pin the first frame to a specific photo or illustration to control how the scene opens.

Ideal for both creative and professional use

Use Cases

Generate a 10-second cinematic scene from a detailed text prompt, including realistic lighting and camera movement

Upload a character portrait as the first frame and get a short video clip with natural facial expressions and head movement

Produce B-roll footage for a video project by describing the environment and mood you want in a single sentence

Create a 1080p product teaser clip by describing the item, background, and motion you want in the prompt

Animate a still illustration or concept art image into a short video that preserves the original style and composition

Prototype a scene for a client presentation by generating a rough video from a scene description before committing to production

Turn a written story beat into a short visual clip to storyboard a film or animation project

Enhance presentations with custom video content

Examples

768p
6s
1m 36s
Prompt Optimizer: Yes

slow handheld camera movement: Two firebenders face off in a dark alley as heavy rain pours. One exhales steam. Sparks ignite from soaked fists. They launch into motion kicks, spins, flaming strikes that hiss on contact with water. Explosions reflect in puddles. Fire clashes mid-air, casting harsh orange light. A final blast sends water and embers flying toward camera. Stylized urban fantasy with dramatic lighting and intense motion.

768p
6s
1m 53s
Prompt Optimizer: Yes

From a soaring aerial, the rider rockets across a collapsing skyline, rooftops narrowing into jagged ledges. Camera hovers tight above as he angles through curved glass and steel, void yawning between buildings. Each rooftop leap sparks drama, until he threads impossibly across the final span, city blurring below.

768p
6s
2m 50s
Prompt Optimizer: Yes

a tiktok dancer is dancing on a small drone, doing flips and tricks

768p
6s
1m 37s
Prompt Optimizer: Yes

a tiktok dancer is dancing on a small drone, doing flips and tricks

768p
6s
1m 36s
Prompt Optimizer: Yes

a tiktok dancer is dancing on a small drone, doing flips and tricks

Switch Category

Effects

Text To Image

Text To Image

Text To Video

Large Language Models

Large Language Models

Text To Speech

Text To Speech

Super Resolution

Super Resolution

Lipsync

AI Music Generation

AI Music Generation

Video Editing

Speech To Text

Speech To Text

AI Enhance Videos

Remove Backgrounds

Remove Backgrounds