ControlVideo is a text-to-video model that restyles existing footage by following the structure of a source video while applying the look and content you describe in a prompt. If you have a clip of someone walking and want it to look like an oil painting, a sketch, or a scene in a different location, you describe it and the model handles the rest. It reads the depth, edges, or pose data from your original video so the new output stays in sync with the motion. The model supports three structure modes: depth maps, Canny edge detection, and pose estimation. Depth mode preserves three-dimensional relationships between objects, edge mode follows silhouettes and contours, and pose mode tracks body positions in human subjects. You control how closely the output follows your prompt versus the original structure using the guidance scale, and you can produce longer clips by enabling the hierarchical sampler. It fits into any video content workflow where you need a different visual style without reshooting. Animators can restyle reference footage, marketers can repurpose clips with new aesthetics, and creators can iterate on a single take until the look is right. Open ControlVideo on Picasso IA, paste your prompt, and run it.
ControlVideo lets you restyle an existing video clip by following its structure and applying the visual content you describe in a text prompt. You upload a short clip, write a description of the look you want, and the model generates a new video that matches the original motion while adopting your specified style. Picasso IA runs ControlVideo directly in the browser with no installation needed. A scene of someone jogging can become a watercolor illustration, a pencil sketch, or a detailed fantasy landscape, all from a single run. It works for animation, product visualization, and creative style tests where you want to change what a video looks like without altering how subjects move through the frame.
Do I need programming skills or technical knowledge to use this? No, just open ControlVideo on Picasso IA, adjust the settings you want, and hit generate.
Is it free to try? Yes, you can run ControlVideo without a subscription to test it on your own footage.
How long does generation take? A standard 15-frame clip at 50 denoising steps typically takes between 30 seconds and 2 minutes depending on current server load.
Which condition type should I choose? Depth works best for scenes with clear spatial layers between foreground and background. Canny is better for preserving hard edges and object silhouettes. Pose is designed specifically for clips with visible human figures moving on screen.
Can I generate longer videos? Yes. Turn on the long-video toggle in the settings panel, and the model uses a hierarchical sampler to keep frames consistent across the full clip duration.
How do I fix flickering or frame inconsistencies? Set the smoother steps field to include intermediate timesteps during generation. This pass reduces visual drift and flickering between adjacent frames.
Where can I use the outputs? The exported video file has no watermark and can go directly into a social post, a presentation, a demo reel, or any other project.
Everything this model can do for you
Run the model on any source video without configuring or retraining additional weights.
Choose depth, Canny edge, or pose to control how structure is extracted from the source video.
Adjust how strongly the output follows the text prompt versus the original video structure.
Enable the hierarchical sampler to produce extended clips beyond the default 15 frames.
Reduce flicker and frame inconsistencies by setting smoother steps during generation.
Reuse the same seed to reproduce identical outputs for side-by-side comparison.
Set the clip duration to match your specific production or publishing requirements.
Random seed option for varied outputs
A white swan movingon the lake, cartoon style.
James bond moonwalk on the beach, animation style.
A striking mallard floats effortlessly on the sparkling pond.