• Picasso AI Logo
    Logo Picasso IA
  • Home
  • AI Image
    Nano Banana 2
  • AI Video
    Veo 3.1 Lite
  • AI Chat
    Gemini 3 Pro
  • Edit Images
  • Upscale Image
  • Remove Background
  • Text to Speech
  • Effects
    NEW
  • Generations
  • Billing
  • Support
  • Account
  1. Collection
  2. Lipsync Video
  3. Lipsync 2

Sync Any Voice to Video with Lipsync 2

Lipsync 2 takes a video clip and a separate audio track and produces a new video where the face in the footage matches every word of the speech. It solves a very specific problem: you have the right visuals and the right audio, but they don't match. Whether you've dubbed dialogue into another language, recorded a corrected voiceover, or generated speech with an AI voice tool, this model syncs them together without any manual frame-by-frame editing. The model gives you several ways to handle the common mismatch between audio length and video length. You can loop or bounce the clip, trim the audio at the cut point, pad with silence, or remap the footage to fill the full duration. A temperature control lets you dial in how expressive the mouth movement looks, from restrained and natural to more animated. For videos with multiple people in frame, an active speaker setting detects who is talking and applies the sync only to that person. Lipsync 2 fits naturally into dubbing workflows, social media video production, and AI-generated spokesperson content. You bring the assets, the model handles the rest. Drop in your files, set a few options, and generate the output in one step.

Official

Sync

15.4k runs

Lipsync 2

2025-07-15

Commercial Use

Table of contents

  • Overview
  • How It Works
  • Frequently Asked Questions
  • Credit Cost
  • Features
  • Use Cases
  • Examples
Get Nano Banana Pro

Overview

Lipsync 2 takes a video file and an audio track and produces a new video where the person's mouth matches every word of the speech. It solves a problem that comes up constantly: you have the footage and the audio, but they don't match. Whether you've dubbed a video into another language, re-recorded a narration, or built a voiceover with an AI speech tool, Picasso IA lets you close that gap without editing software or frame-by-frame work. The result is a naturally animated face that moves in sync with every syllable.

How It Works

  • Upload your video (MP4) and audio (WAV) using the input fields on the model page.
  • Choose a sync mode to decide what happens when the audio and video are different lengths: loop repeats the clip, bounce plays it forward and backward, cut-off trims the audio, silence pads the end, or remap stretches the footage to fit.
  • Use the temperature slider to control how expressive the lip animation looks, from subtle to more pronounced.
  • Toggle active speaker detection on if your scene has more than one person in frame, so the model applies lipsync only to the speaker.
  • Hit generate and download the output video with the synced mouth movement applied.

Frequently Asked Questions

Do I need programming skills or technical knowledge to use this? No, just open Lipsync 2 on Picasso IA, adjust the settings you want, and hit generate.

Is it free to try? Yes, you can run Lipsync 2 online for free. No account setup is needed to get started.

How long does it take to get results? Short clips typically process in under a minute. Longer files take more time depending on duration and resolution.

What file formats are supported? The model accepts MP4 video files and WAV audio files. Make sure both files are in these formats before uploading.

Can I control how natural the lip movement looks? Yes. The temperature setting lets you dial between subtle, close-to-realistic mouth motion and more expressive animation.

What happens if my audio is longer than my video? Pick a sync mode before generating. Loop repeats the video to fill the audio, bounce reverses it, cut-off ends the audio at the video length, silence adds quiet padding, and remap stretches the footage across the full audio duration.

Where can I use the output videos? The output is a standard video file. Use it in social content, localized product videos, presentations, or any project where you need the face and voice to match.

Credit Cost

Each generation consumes 20 credits

20 credits

or 100 credits for 5 generations

Features

Everything this model can do for you

Realistic lip sync

Matches mouth movement to speech at the frame level for natural-looking results.

Five sync modes

Handle audio-video length mismatches with loop, bounce, cut-off, silence, or remap options.

Expressiveness control

Dial the temperature between 0 and 1 to get subtle or more animated mouth movement.

Active speaker targeting

Detects who is talking in a multi-person scene and applies the sync to that person only.

Standard format support

Accepts MP4 video and WAV audio so no conversion is needed before uploading.

Browser-based workflow

Run the model from any device without installing software or writing a single line of code.

Fast and automated processing

Suitable for different languages and accents

Use Cases

Dub a video into another language by swapping the original audio for a translated voiceover and letting the model remap the lip movements to match

Sync a re-recorded narration to existing footage when the new take runs slightly longer or shorter than the original

Apply AI-generated speech to a spokesperson clip where the face needs to match a script that was changed after filming

Animate a talking head video by pairing a short looping face clip with a full-length audio recording using loop or bounce mode

Detect the active speaker in a two-person interview and apply lipsync only to the person who is talking

Produce localized versions of a product demo video by substituting translated audio without reshooting the footage

Refine the expressiveness of lip movement for character animation work by adjusting the temperature setting

Educational content adaptation

Examples

Audio
2m 6s
Sync Mode: loop
Temperature: 0.5
Audio
4m 54s
Sync Mode: loop
Temperature: 0.5
Active Speaker: No
Occlusion Detection: No

Switch Category

Effects

Text To Image

Text To Image

Text To Video

Large Language Models

Large Language Models

Text To Speech

Text To Speech

Super Resolution

Super Resolution

Lipsync

AI Music Generation

AI Music Generation

Video Editing

Speech To Text

Speech To Text

AI Enhance Videos

Remove Backgrounds

Remove Backgrounds