Omni Human 1.5 takes a single photo and an audio clip and turns them into a short, realistic video of the person speaking. It solves the time and cost barrier of producing talking-head content, cutting the entire process down to a photo, an audio file, and a click. The model syncs lips to speech with film-level accuracy, preserving the subject's skin texture, lighting, and facial geometry frame by frame. An optional text prompt gives you direct control over scene composition, camera movement, and character motion. Fast mode lets you trade some fine detail for speed when you need quick iterations. Omni Human 1.5 fits naturally into content workflows that would otherwise require video recording, studio setup, or motion capture. Open it on Picasso IA, upload your inputs, and get a ready-to-use video in seconds.
Omni Human 1.5 turns a single still photo and a short audio clip into a film-grade talking video, matching lip movement to speech with frame-level accuracy. It solves a problem that used to demand a full production setup: putting convincing words in a digital subject's mouth without recording any new footage. On Picasso IA, you supply the image and the audio, and the model does the rendering. An optional text prompt gives you control over scene context, body motion, and camera behavior, so the output fits naturally into your existing project.
Do I need programming skills or technical knowledge to use this? No, just open Omni Human 1.5 on Picasso IA, adjust the settings you want, and hit generate.
Is it free to try? You can run Omni Human 1.5 directly in your browser on Picasso IA without downloading or installing anything. Check the credit cost shown on the model page before you start.
What is the audio length limit? Your audio clip must be 35 seconds or shorter. Files longer than that will return an error and the generation will not complete, so trim your recording beforehand.
What type of image gives the best results? A front-facing photo with the subject's face clearly visible works best. The model also handles stylized illustrations and animated characters, though realistic portraits with good lighting tend to produce the most natural lip sync.
Can I control movement and scene details beyond the lip sync? Yes. The optional prompt field accepts descriptions of the scene, head and body movement, and camera direction. It supports English, Chinese, Japanese, Korean, Spanish, and Indonesian.
What if the output doesn't match what I had in mind? Try making your prompt more specific about the movement or scene you want. Set a fixed seed to lock in a run and then adjust one variable at a time to isolate what needs changing.
Where can I use the videos I create? The generated video is yours to download and use in social media content, client presentations, creative short films, or any other project you are working on.
Everything this model can do for you
Generates video with realistic facial motion, lighting, and skin texture at production quality.
Works from one photo, portrait, or illustration without video footage or 3D models.
Accepts voiceover in English, Spanish, Japanese, Korean, Chinese, and Indonesian.
Add an optional prompt to direct scene composition, character movement, and camera angle.
Cut generation time by activating fast mode when speed matters more than fine detail.
Reuse a seed value to regenerate the exact same output across multiple runs.
Upload MP3, WAV, or other common audio files up to 35 seconds long.