Grok Imagine R2V is a text-to-video model that uses reference images to shape the visual style, composition, and content of generated clips. Instead of relying on a single text prompt to define everything, you upload between one and seven images that act as a visual brief, giving the model concrete direction on what the output should look like. The model accepts prompts alongside your reference images to control motion and narrative, then produces clips from 1 to 10 seconds in 480p or 720p. You can choose from seven aspect ratios, including vertical 9:16 for social formats and widescreen 16:9 for cinematic looks. Every run stays inside one interface, with no file conversion or external tools required. Paste in a product photo or a character concept alongside a short description, set the duration, pick a resolution, and the video is ready within minutes. It fits naturally into social content production, early-stage creative pitches, and any project where you need a moving visual but only have still images to start with.
Grok Imagine R2V turns a text prompt and a set of reference images into a short video, giving you direct control over the visual direction before generation starts. The reference images aren't used as opening frames; they guide the style, color palette, and subject matter of the entire clip. This is useful when you already have a clear visual in mind and just need it to move. On Picasso IA, the whole process runs in a browser with no code or setup required. Upload your references, describe the action, and the model builds the video from both inputs combined.
Do I need programming skills or technical knowledge to use this? No, just open Grok Imagine R2V on Picasso IA, adjust the settings you want, and hit generate.
Is it free to try? Yes, you can run Grok Imagine R2V without a paid subscription to start. Check the current plan details for information on generation limits and credits.
How long does it take to get results? Most clips finish in under two minutes, depending on the duration and resolution you selected. Shorter videos at 480p tend to process the fastest.
What output formats are supported? The model returns standard video files you can download directly from the results page. These work across social media platforms, video editors, and presentation tools.
Can I use multiple reference images at once? Yes, you can upload up to 7 reference images per generation. More images give the model a richer visual context, which often improves style consistency across the whole clip.
What aspect ratios are available? Six options are available: 16:9, 4:3, 1:1, 9:16, 3:4, and 3:2. This covers widescreen, square, and vertical formats, so you can match the output to wherever it will be published.
What happens if I'm not happy with the result? Try adjusting your prompt, swapping in different reference images, or changing the duration and resolution settings. Small changes to the prompt often produce noticeably different outputs.
The credit cost for this model varies based on the settings you choose. Below are the costs per configuration:
Everything this model can do for you
Upload up to 7 images that shape the visual style, composition, and content of the generated video.
Choose from 7 ratios including 9:16 for vertical social content and 16:9 for widescreen formats.
Set clip length anywhere from 1 to 10 seconds to match the format you're producing.
Generate in 480p for fast previews or 720p for sharper, share-ready outputs.
Describe the motion, scene, and atmosphere in plain language to direct the video content.
Run the model directly in the browser with no software to install or accounts to configure.
Download the finished video as a standard file ready for any editor, social platform, or presentation.
Four friends sitting together at a sun-drenched outdoor restaurant table, laughing and waving at the camera. Warm golden hour light, Mediterranean terrace setting with climbing vines and the sea in the background. Slow cinematic camera push-in, joyful and candid atmosphere
A grand museum gallery comes to life at night: the portrait of Kepler gazes at a rotating globe of Earth, while a butterfly specimen escapes its glass case and flutters past ancient temple artifacts. Warm museum lighting, slow tracking shot down the gallery corridor, Night at the Museum style, magical and cinematic
A dramatic time-lapse of clouds rushing over the snow-capped Himalayan peaks, sunlight breaking through gaps to create god rays across the valleys, sweeping drone shot, epic nature documentary style
The Earth slowly rotates in the vast emptiness of space, clouds swirling over continents, city lights twinkling on the night side, gentle camera drift, IMAX documentary style, awe-inspiring
A breathtaking cinematic aerial shot sweeping over the pyramids at golden hour, with a monarch butterfly gliding through the warm desert air in the foreground, dust particles catching the light, epic scale