• Picasso AI Logo
    Logo Picasso IA
  • Home
  • AI Image
    Nano Banana 2
  • AI Video
    Veo 3.1 Lite
  • AI Chat
    Gemini 3 Pro
  • Edit Images
  • Upscale Image
  • Remove Background
  • Text to Speech
  • Effects
  • AI Toolkit
    NEW
  • Generations
  • Billing
  • Support
  • Account
Unlimited Videos ARE HERE · Nano Banana 2 & GPT Image 2.0 UNLIMITED UNTIL June 30Upgrade
  1. Collection
  2. Text to Video
  3. Seedance 2.0 Mini

Seedance 2.0 Mini: Text to Video with Native Audio

Seedance 2.0 Mini is a text-to-video AI model built for high-volume production at a lower cost than its full-size counterpart. It takes a text prompt, a reference image, or a combination of both and outputs a short video clip with synchronized audio. For creators who need to produce dozens of clips per week without breaking their budget, it removes the biggest friction points: rendering time, audio sync, and per-clip cost. The model accepts up to nine reference images for character consistency across scenes, so a branded character or product looks the same from clip to clip. It also supports reference audio files for lip-sync, letting you match a pre-recorded voice to on-screen movement. Output resolution goes up to 720p, with aspect ratios from 9:16 vertical to 21:9 cinematic widescreen. It fits into any short-form content workflow: drop in a product photo, write a one-sentence scene description, and get a ready-to-publish clip in under a minute. Social media teams, indie game developers, and small production houses all use it to cut the time between idea and finished asset.

Official

Bytedance

26 runs

Seedance 2.0 Mini

2026-06-24

Commercial Use

Table of contents

  • Overview
  • How It Works
  • Frequently Asked Questions
  • Credit Cost
  • Features
  • Use Cases
  • Examples
Get Nano Banana Pro

Overview

Seedance 2.0 Mini is a text-to-video model built for high-volume production, turning text prompts, images, and audio references into short videos without any coding. It handles multimodal inputs natively, so you can anchor the opening frame with a photo, guide the style with reference images, and add synchronized audio all in a single run. On Picasso IA, the whole process takes a few clicks. A content team producing dozens of product clips per week, or a freelancer building social reels on a tight deadline, can go from idea to finished video in under a minute.

How It Works

  • Write a text prompt describing the scene, characters, motion, and mood (up to 4,000 characters; keeping it under 600 words tends to give sharper results)
  • Upload a reference image to set the opening frame, or provide both a first and a last frame image to define the start and end of the shot
  • Choose your resolution (480p or 720p), aspect ratio from 16:9 to 9:16 or 1:1, and duration in seconds, or set duration to automatic so the model picks the best length
  • Enable audio generation to add dialogue, sound effects, and background music synchronized with the video
  • Download the finished MP4, ready for social platforms, presentations, or your editing timeline

Frequently Asked Questions

Do I need programming skills or technical knowledge to use this? No, just open Seedance 2.0 Mini on Picasso IA, adjust the settings you want, and hit generate.

Is it free to try? Yes. You get free credits on sign-up, enough to run several videos before adding more. No payment method is required to start.

How long does it take to get results? A 5-second clip at 720p is typically ready in under a minute. Shorter durations and 480p render faster, so if you are iterating quickly, lower settings speed up your feedback loop.

What aspect ratios are available? You can choose from 16:9, 9:16, 1:1, 4:3, 3:4, 21:9, and 9:21. Setting it to adaptive lets the model pick the best ratio based on your image or prompt.

Can I control what audio gets generated? Yes. Place any spoken dialogue in double quotes inside your prompt and the model generates lip-synced speech for those lines. Background music and sound effects are added automatically to match the scene.

What happens if I dislike the result? Rewrite the prompt, change the duration, or adjust the aspect ratio and run it again. To reproduce a previous result and build on it, reuse the same seed value from that run.

Where can I use the videos I generate? The output is a standard MP4 file. You can publish it to social media, drop it into a video editor, embed it on a website, or use it in a client presentation without any restrictions from Picasso IA.

Credit Cost

The credit cost for this model varies based on the settings you choose. Below are the costs per configuration:

ConfigurationCredits
480p · video_in1per second
480p · non_video_in0.8per second
720p · video_in2.2per second
720p · non_video_in1.8per second

Features

Everything this model can do for you

Native audio generation

Produces synchronized dialogue, sound effects, and background music in a single pass.

Multimodal input

Accepts text prompts, first-frame images, last-frame images, and reference videos together.

Character consistency

Supports up to nine reference images so the same person or object looks identical across clips.

Lip-sync support

Match pre-recorded audio to on-screen movement using up to three reference audio files.

Flexible aspect ratios

Output in eight formats from 9:16 vertical to 21:9 widescreen without post-processing.

Reproducible results

Reuse any seed value to regenerate the exact same clip on demand.

Intelligent duration

Set duration to -1 and the model picks the optimal video length based on your content.

Use Cases

Generate a 5-second product video from a single photo and a short text description

Produce a vertical social media clip in 9:16 format directly from a text prompt

Animate a character consistently across multiple scenes using up to nine reference images

Add lip-synced dialogue to a video by uploading a reference audio file and citing it in your prompt

Create a cinematic 21:9 clip from a wide landscape photo for use as a video header

Set a fixed seed to reproduce the exact same video output across multiple runs for batch consistency

Generate ambient background music and sound effects automatically alongside your video content

Use a first-frame and last-frame image pair to control exactly how a scene opens and closes

Examples

720p
16:9
5s
2m 51s
Generate Audio: Yes

Hyper-realistic cinematic street racing shot. Audio: High-pitched engine revving, aggressive tire screech, and rain hitting metal. Camera starts low to the ground on a wet asphalt hairpin curve at night. A matte-black vintage sports car drifts aggressively into frame. The camera executes a fast whip-pan to the right, perfectly tracking the car's speed. The car slides out of frame, kicking up a massive rooster tail of neon-lit water droplets. The camera abruptly stops panning and immediately rack-focuses to a wet, crushed soda can resting on the asphalt in the extreme foreground. Perfect water physics, 1080p, 24fps.

720p
16:9
15s
3m 5s
Generate Audio: Yes

Photorealistic cinematic, one single continuous unbroken shot from start to finish — absolutely no cuts, no edits, no transitions, one fluid uninterrupted camera move, 16:9. Bright daylight in a lush green forest, sunlight filtering through the canopy, leaves and tree trunks softly blurred. The shot begins directly behind a vivid colorful butterfly fluttering fast and dynamically through the forest, the camera chasing close behind its wings as it weaves between trees, shafts of light and foliage — erratic, lively and kinetic. Without any cut, in the same fluid motion, the camera keeps racing with the darting butterfly deeper through the trees. Then, at the midpoint, a parrot suddenly bursts in from the side and snatches the butterfly out of the air, biting down and clamping onto the edge of one of its wings in its beak — and the camera sweeps with the strike in one continuous move. Still unbroken, the camera drives in onto the moment of capture and explodes into a dramatic bullet-time effect: time nearly freezes as the parrot's beak bites and clamps onto the butterfly's wing in an extreme macro close-up, the wing bending and creasing in the beak's grip, and the camera sweeps slowly around the frozen instant — shimmering powder and tiny iridescent scales scattering off the pinched wing and hanging suspended motionless in mid-air, the delicate wing membranes and veins razor-sharp, the parrot's beak texture and eye in crisp detail, the butterfly caught mid-flutter — hyper-detailed. One seamless continuous camera move — chase from behind, racing through the forest, into the parrot's strike, ending in a bullet-time orbit around the catch. Flowing and dynamic, collapsing into near-frozen bullet time only at the macro catch. Shallow depth of field, strong motion blur on the chase resolving into crisp frozen detail, bright natural daylight, dappled forest light, high dynamic range, ultra-detailed photorealistic textures — wing scales, powder, feathers, foliage — 4K, high-end wildlife documentary look. Pacing over 10 seconds: about 4–5 seconds of dynamic butterfly flight, the parrot striking around the midpoint, then the rest in bullet-time macro of the parrot biting the wing. 10 seconds, single continuous take.

720p
16:9
15s
3m 3s
Generate Audio: Yes

single continuous shot, one take no cuts, cinematic FPV oner, 4K ultra-detailed, photorealistic macro detail, anamorphic film look, epic cinematic scale, cinematic lighting, professional color grading, sharp focus, hyper-detailed texture, film grain, depth of field mastery, fluid drone flight A colossal storm-giant — its body churning cloud wrapped in branching veins of electric-cyan lightning — rises from the thunderheads with a deep boom, a massive arm sweeping through a squadron of riders mounted on winged lions, their gleaming etched armor flashing, feathered wings beating, lightning-lances crackling and banners snapping in the wind. Around them float fortress-islands of weathered white-stone bastions among colossal billowing cumulus clouds in a brilliant blue sky. The whole battle blazes under hard high-key midday sun in saturated white-cloud, azure and electric-cyan, no trace of golden hour. The camera is an FPV presence flying with the storm-giant — opening in extreme 4K macro against its crackling cloud-flesh, repeatedly diving deep into the billowing cumulus to catch its wispy curling texture, with two brief slow-motion macro beats: one on a rider at the three-second mark and one on a lightning strike at the seven-second mark, each snapping back to full speed, never stopping, never pulling up or back. The single unbroken take builds its arc through pure flight and a string of macro brushes across cloud, armor and lightning.

Switch Category

Effects

Text To Image

Text To Video

Large Language Models

Text To Speech

Super Resolution

Lipsync

AI Music Generation

Video Editing

Speech To Text

AI Enhance Videos

Remove Backgrounds