Omni Human 1.5: Realistic Lipsync Video from a Photo

Omni Human 1.5 takes a single photo and an audio clip and turns them into a short, realistic video of the person speaking. It solves the time and cost barrier of producing talking-head content, cutting the entire process down to a photo, an audio file, and a click. The model syncs lips to speech with film-level accuracy, preserving the subject's skin texture, lighting, and facial geometry frame by frame. An optional text prompt gives you direct control over scene composition, camera movement, and character motion. Fast mode lets you trade some fine detail for speed when you need quick iterations. Omni Human 1.5 fits naturally into content workflows that would otherwise require video recording, studio setup, or motion capture. Open it on Picasso IA, upload your inputs, and get a ready-to-use video in seconds.

Official

Bytedance

32.5k runs

Omni Human 1.5

2025-10-23

Commercial Use

Overview

Omni Human 1.5 turns a single still photo and a short audio clip into a film-grade talking video, matching lip movement to speech with frame-level accuracy. It solves a problem that used to demand a full production setup: putting convincing words in a digital subject's mouth without recording any new footage. On Picasso IA, you supply the image and the audio, and the model does the rendering. An optional text prompt gives you control over scene context, body motion, and camera behavior, so the output fits naturally into your existing project.

How It Works

Upload a clear photo of a human face, illustrated character, or portrait as your base image
Add an audio file in MP3 or WAV format, keeping it under 35 seconds (longer clips will cause generation to fail)
Write an optional text prompt to specify scene details, body or head movement, or camera framing
Choose whether to run in standard mode for full detail, or fast mode for a quicker result at a slight reduction in motion fidelity
Download the output video once the model finishes rendering the lip-synced sequence

Frequently Asked Questions

Do I need programming skills or technical knowledge to use this? No, just open Omni Human 1.5 on Picasso IA, adjust the settings you want, and hit generate.

Is it free to try? You can run Omni Human 1.5 directly in your browser on Picasso IA without downloading or installing anything. Check the credit cost shown on the model page before you start.

What is the audio length limit? Your audio clip must be 35 seconds or shorter. Files longer than that will return an error and the generation will not complete, so trim your recording beforehand.

What type of image gives the best results? A front-facing photo with the subject's face clearly visible works best. The model also handles stylized illustrations and animated characters, though realistic portraits with good lighting tend to produce the most natural lip sync.

Can I control movement and scene details beyond the lip sync? Yes. The optional prompt field accepts descriptions of the scene, head and body movement, and camera direction. It supports English, Chinese, Japanese, Korean, Spanish, and Indonesian.

What if the output doesn't match what I had in mind? Try making your prompt more specific about the movement or scene you want. Set a fixed seed to lock in a run and then adjust one variable at a time to isolate what needs changing.

Where can I use the videos I create? The generated video is yours to download and use in social media content, client presentations, creative short films, or any other project you are working on.

Credit Cost

The credit cost for this model varies based on the settings you choose. Below are the costs per configuration:

ConfigurationCredits

OmniHuman 1.53.2per second

Features

Everything this model can do for you

Film-grade output

Generates video with realistic facial motion, lighting, and skin texture at production quality.

Single-image input

Works from one photo, portrait, or illustration without video footage or 3D models.

Multilingual audio support

Accepts voiceover in English, Spanish, Japanese, Korean, Chinese, and Indonesian.

Text prompt control

Add an optional prompt to direct scene composition, character movement, and camera angle.

Fast mode option

Cut generation time by activating fast mode when speed matters more than fine detail.

Reproducible results

Reuse a seed value to regenerate the exact same output across multiple runs.

Flexible audio input

Upload MP3, WAV, or other common audio files up to 35 seconds long.

Use Cases

Animate a static portrait photo into a lip-synced video by uploading the image and an audio clip of up to 35 seconds

Create a talking-head video for a social media post by pairing a single photo with a recorded voiceover

Produce a digital spokesperson video for a product page using just a portrait photo and a scripted audio file

Generate a multilingual presentation video from one photo by recording audio in Spanish, Japanese, Korean, or English and letting the model sync the lips automatically

Turn an illustrated character or avatar into a speaking figure by feeding the artwork and a narration clip to the model

Create a personalized video message by uploading a portrait photo and attaching a short audio recording as input

Test dialogue timing for a short film by running a reference still against a scratch audio track

Examples

Audio

4m 40s

Fast Mode: Yes

Audio

6m 10s

Fast Mode: Yes

Audio

3m 17s

Fast Mode: Yes

A woman sings and strums her guitar

Omni Human 1.5: Realistic Lipsync Video from a Photo

Official

Bytedance

32.5k runs

Omni Human 1.5

2025-10-23

Commercial Use

Overview

How It Works

Upload a clear photo of a human face, illustrated character, or portrait as your base image

Add an audio file in MP3 or WAV format, keeping it under 35 seconds (longer clips will cause generation to fail)

Write an optional text prompt to specify scene details, body or head movement, or camera framing

Choose whether to run in standard mode for full detail, or fast mode for a quicker result at a slight reduction in motion fidelity

Download the output video once the model finishes rendering the lip-synced sequence

Frequently Asked Questions

Do I need programming skills or technical knowledge to use this? No, just open Omni Human 1.5 on Picasso IA, adjust the settings you want, and hit generate.

Is it free to try? You can run Omni Human 1.5 directly in your browser on Picasso IA without downloading or installing anything. Check the credit cost shown on the model page before you start.

What is the audio length limit? Your audio clip must be 35 seconds or shorter. Files longer than that will return an error and the generation will not complete, so trim your recording beforehand.

Where can I use the videos I create? The generated video is yours to download and use in social media content, client presentations, creative short films, or any other project you are working on.