• Picasso AI Logo
    Logo Picasso IA
  • Home
  • AI Image
    Nano Banana 2
  • AI Video
    Veo 3.1 Fast
  • AI Chat
    Gemini 3 Pro
  • Edit Images
  • Upscale Image
  • Remove Background
  • Text to Speech
  • Effects
    NEW
  • Generations
  • Billing
  • Support
  • Account
  1. Collection
  2. Lipsync Video
  3. Omni Human 1.5

Omni Human 1.5: Realistic Lipsync Video from a Photo

Omni Human 1.5 takes a single photo and an audio clip and turns them into a short, realistic video of the person speaking. It solves the time and cost barrier of producing talking-head content, cutting the entire process down to a photo, an audio file, and a click. The model syncs lips to speech with film-level accuracy, preserving the subject's skin texture, lighting, and facial geometry frame by frame. An optional text prompt gives you direct control over scene composition, camera movement, and character motion. Fast mode lets you trade some fine detail for speed when you need quick iterations. Omni Human 1.5 fits naturally into content workflows that would otherwise require video recording, studio setup, or motion capture. Open it on Picasso IA, upload your inputs, and get a ready-to-use video in seconds.

Official

Bytedance

32.5k runs

Omni Human 1.5

2025-10-23

Commercial Use

Table of contents

  • Overview
  • How It Works
  • Frequently Asked Questions
  • Credit Cost
  • Features
  • Use Cases
  • Examples
Get Nano Banana Pro

Overview

Omni Human 1.5 turns a single still photo and a short audio clip into a film-grade talking video, matching lip movement to speech with frame-level accuracy. It solves a problem that used to demand a full production setup: putting convincing words in a digital subject's mouth without recording any new footage. On Picasso IA, you supply the image and the audio, and the model does the rendering. An optional text prompt gives you control over scene context, body motion, and camera behavior, so the output fits naturally into your existing project.

How It Works

  • Upload a clear photo of a human face, illustrated character, or portrait as your base image
  • Add an audio file in MP3 or WAV format, keeping it under 35 seconds (longer clips will cause generation to fail)
  • Write an optional text prompt to specify scene details, body or head movement, or camera framing
  • Choose whether to run in standard mode for full detail, or fast mode for a quicker result at a slight reduction in motion fidelity
  • Download the output video once the model finishes rendering the lip-synced sequence

Frequently Asked Questions

Do I need programming skills or technical knowledge to use this? No, just open Omni Human 1.5 on Picasso IA, adjust the settings you want, and hit generate.

Is it free to try? You can run Omni Human 1.5 directly in your browser on Picasso IA without downloading or installing anything. Check the credit cost shown on the model page before you start.

What is the audio length limit? Your audio clip must be 35 seconds or shorter. Files longer than that will return an error and the generation will not complete, so trim your recording beforehand.

What type of image gives the best results? A front-facing photo with the subject's face clearly visible works best. The model also handles stylized illustrations and animated characters, though realistic portraits with good lighting tend to produce the most natural lip sync.

Can I control movement and scene details beyond the lip sync? Yes. The optional prompt field accepts descriptions of the scene, head and body movement, and camera direction. It supports English, Chinese, Japanese, Korean, Spanish, and Indonesian.

What if the output doesn't match what I had in mind? Try making your prompt more specific about the movement or scene you want. Set a fixed seed to lock in a run and then adjust one variable at a time to isolate what needs changing.

Where can I use the videos I create? The generated video is yours to download and use in social media content, client presentations, creative short films, or any other project you are working on.

Credit Cost

Each generation consumes 200 credits

200 credits

or 1000 credits for 5 generations

Features

Everything this model can do for you

Film-grade output

Generates video with realistic facial motion, lighting, and skin texture at production quality.

Single-image input

Works from one photo, portrait, or illustration without video footage or 3D models.

Multilingual audio support

Accepts voiceover in English, Spanish, Japanese, Korean, Chinese, and Indonesian.

Text prompt control

Add an optional prompt to direct scene composition, character movement, and camera angle.

Fast mode option

Cut generation time by activating fast mode when speed matters more than fine detail.

Reproducible results

Reuse a seed value to regenerate the exact same output across multiple runs.

Flexible audio input

Upload MP3, WAV, or other common audio files up to 35 seconds long.

Use Cases

Animate a static portrait photo into a lip-synced video by uploading the image and an audio clip of up to 35 seconds

Create a talking-head video for a social media post by pairing a single photo with a recorded voiceover

Produce a digital spokesperson video for a product page using just a portrait photo and a scripted audio file

Generate a multilingual presentation video from one photo by recording audio in Spanish, Japanese, Korean, or English and letting the model sync the lips automatically

Turn an illustrated character or avatar into a speaking figure by feeding the artwork and a narration clip to the model

Create a personalized video message by uploading a portrait photo and attaching a short audio recording as input

Test dialogue timing for a short film by running a reference still against a scratch audio track

Examples

Input
Input 1
Output
4m 40s
View Example
Input
Input 1
Output
6m 10s
View Example
Input
Input 1
Output
A woman sings and strums her guitar
3m 17s
View Example

Switch Category

Effects

Text To Image

Text To Image

Text To Video

Large Language Models

Large Language Models

Text To Speech

Text To Speech

Super Resolution

Super Resolution

Lipsync

AI Music Generation

AI Music Generation

Video Editing

Speech To Text

Speech To Text

AI Enhance Videos

Remove Backgrounds

Remove Backgrounds