Fabric 1.0 is an image-to-video model that takes a still photo and an audio clip, then generates a video where the subject appears to speak or sing in sync with the sound. It solves a practical production problem: creating a spokesperson or character video without scheduling a film crew, booking studio time, or recording anyone on camera. You supply the assets you already have, and the model handles the animation. The model supports output at up to 720p resolution, giving you clean video suitable for websites, presentations, and social media posts. It reads the audio track syllable by syllable to keep mouth movements accurate throughout the clip. You can also choose 480p if you need a smaller file size or a faster export time. This fits naturally into workflows where you already have a face photo and a recorded audio file ready. A marketer can produce a spokesperson segment from a product photo without a shoot. A course creator can animate a character with their own narration in a few minutes. Upload your image and audio on Picasso IA, pick a resolution, and your talking video is ready to download.
Fabric 1.0 is an image-to-video model that animates a still photo to match any audio you provide, producing a video where the subject appears to speak or sing in sync with the sound. On Picasso IA, you run it directly from your browser with no software to install and no code to write. The model addresses a real production bottleneck: getting a talking video from a static image without booking a filming session or hiring on-screen talent. A product marketer, a teacher building an online course, or a social media creator can all use assets they already own, a face photo and an audio file, to produce broadcast-ready video in seconds.
Do I need programming skills or technical knowledge to use this? No, just open Fabric 1.0 on Picasso IA, adjust the settings you want, and hit generate.
Is it free to try? Yes, you can run Fabric 1.0 on Picasso IA without a paid subscription. Free-tier usage lets you test the output quality before committing to anything.
How long does it take to get results? Most generations finish within seconds to a minute. The exact time depends on the length of your audio clip and the resolution you choose.
What output formats are supported? The model returns a downloadable video file compatible with standard editing tools and social media upload requirements.
What kind of photo works best? The model works best with a clear, forward-facing photo where the subject's face is well-lit and fully visible. Heavily obscured or side-profile images may reduce lip-sync accuracy.
Can I use the output videos in commercial projects? Yes, the videos you generate belong to you and can be used in client work, marketing materials, or published content.
What if the lip-sync does not look accurate? Try using a higher-contrast photo with the face clearly centered, and make sure your audio recording is clean with minimal background noise. Small improvements to the source files usually produce noticeably better results.
Everything this model can do for you
Mouth movements stay frame-accurate to every phoneme in the audio track.
Export videos at up to 720p resolution, clean and ready for web or social media.
Choose 480p for faster exports or 720p for higher-quality final deliverables.
Produce a talking video from a single still photo with no camera setup.
Works with any audio input, from studio voiceovers to casual voice recordings.
Receive a finished lip-synced video within seconds of submitting your files.
Download the result as a standard video file ready to use in any project.
User-friendly API integration