• Picasso AI Logo
    Logo Picasso IA
  • Home
  • AI Image
    Nano Banana 2
  • AI Video
    Veo 3.1 Fast
  • AI Chat
    Gemini 3 Pro
  • Edit Images
  • Upscale Image
  • Remove Background
  • Text to Speech
  • Effects
    NEW
  • Generations
  • Billing
  • Support
  • Account
  1. Collection
  2. Text to Video
  3. Grok Imagine R2v

Grok Imagine R2V: Turn Photos into AI Video

Grok Imagine R2V is a text-to-video model that uses reference images to shape the visual style, composition, and content of generated clips. Instead of relying on a single text prompt to define everything, you upload between one and seven images that act as a visual brief, giving the model concrete direction on what the output should look like. The model accepts prompts alongside your reference images to control motion and narrative, then produces clips from 1 to 10 seconds in 480p or 720p. You can choose from seven aspect ratios, including vertical 9:16 for social formats and widescreen 16:9 for cinematic looks. Every run stays inside one interface, with no file conversion or external tools required. Paste in a product photo or a character concept alongside a short description, set the duration, pick a resolution, and the video is ready within minutes. It fits naturally into social content production, early-stage creative pitches, and any project where you need a moving visual but only have still images to start with.

Official

Xai

6.3k runs

Grok Imagine R2v

2026-03-23

Commercial Use

Table of contents

  • Overview
  • How It Works
  • Frequently Asked Questions
  • Credit Cost
  • Features
  • Use Cases
  • Examples
Get Nano Banana Pro

Overview

Grok Imagine R2V turns a text prompt and a set of reference images into a short video, giving you direct control over the visual direction before generation starts. The reference images aren't used as opening frames; they guide the style, color palette, and subject matter of the entire clip. This is useful when you already have a clear visual in mind and just need it to move. On Picasso IA, the whole process runs in a browser with no code or setup required. Upload your references, describe the action, and the model builds the video from both inputs combined.

How It Works

  • Upload between 1 and 7 reference images that capture the visual style, subject, or mood you want in the video
  • Write a text prompt describing what should happen: the action, scene, or atmosphere you have in mind
  • Set your video duration from 1 to 10 seconds, pick your resolution (480p or 720p), and choose an aspect ratio from options like 16:9, 1:1, or 9:16
  • The model reads your prompt and your images together, then generates a clip that reflects both inputs
  • Download the finished video file directly from the results page when processing is complete

Frequently Asked Questions

Do I need programming skills or technical knowledge to use this? No, just open Grok Imagine R2V on Picasso IA, adjust the settings you want, and hit generate.

Is it free to try? Yes, you can run Grok Imagine R2V without a paid subscription to start. Check the current plan details for information on generation limits and credits.

How long does it take to get results? Most clips finish in under two minutes, depending on the duration and resolution you selected. Shorter videos at 480p tend to process the fastest.

What output formats are supported? The model returns standard video files you can download directly from the results page. These work across social media platforms, video editors, and presentation tools.

Can I use multiple reference images at once? Yes, you can upload up to 7 reference images per generation. More images give the model a richer visual context, which often improves style consistency across the whole clip.

What aspect ratios are available? Six options are available: 16:9, 4:3, 1:1, 9:16, 3:4, and 3:2. This covers widescreen, square, and vertical formats, so you can match the output to wherever it will be published.

What happens if I'm not happy with the result? Try adjusting your prompt, swapping in different reference images, or changing the duration and resolution settings. Small changes to the prompt often produce noticeably different outputs.

Credit Cost

Each generation consumes 10 credits

10 credits

or 50 credits for 5 generations

Features

Everything this model can do for you

Reference image input

Upload up to 7 images that shape the visual style, composition, and content of the generated video.

Flexible aspect ratios

Choose from 7 ratios including 9:16 for vertical social content and 16:9 for widescreen formats.

Adjustable duration

Set clip length anywhere from 1 to 10 seconds to match the format you're producing.

Two resolution options

Generate in 480p for fast previews or 720p for sharper, share-ready outputs.

Text prompt control

Describe the motion, scene, and atmosphere in plain language to direct the video content.

No setup required

Run the model directly in the browser with no software to install or accounts to configure.

Clean file output

Download the finished video as a standard file ready for any editor, social platform, or presentation.

Use Cases

Turn a set of product photos into a short promotional video by uploading the images and describing the motion you want

Generate a stylized video clip from character concept art by uploading the illustrations and writing a scene description

Create a vertical 9:16 social video from a single portrait photo by describing the background movement or animation you want

Produce a cinematic 16:9 clip from landscape reference photos and a short description of the camera movement

Build a quick storyboard preview by uploading rough sketches and turning them into a 5-second animated clip

Generate a mood reel for a brand pitch by uploading inspiration images and writing a one-line description of the atmosphere

Create a short animated intro from a logo image and a text prompt describing how it should appear on screen

Examples

Input
Input 1
Input 2
+2Output
Four friends sitting together at a sun-drenched outdoor restaurant table, laughing and waving at the camera. Warm golden hour light, Mediterranean terrace setting with climbing vines and the sea in the background. Slow cinematic camera push-in, joyful and candid atmosphere
1m 51s
View Example
Input
Input 1
Input 2
+1Output
A grand museum gallery comes to life at night: the portrait of Kepler gazes at a rotating globe of Earth, while a butterfly specimen escapes its glass case and flutters past ancient temple artifacts. Warm museum lighting, slow tracking shot down the gallery corridor, Night at the Museum style, magical and cinematic
1m 54s
View Example
Input
Input 1
Output
A dramatic time-lapse of clouds rushing over the snow-capped Himalayan peaks, sunlight breaking through gaps to create god rays across the valleys, sweeping drone shot, epic nature documentary style
1m 38s
View Example
Input
Input 1
Output
The Earth slowly rotates in the vast emptiness of space, clouds swirling over continents, city lights twinkling on the night side, gentle camera drift, IMAX documentary style, awe-inspiring
49.9s
View Example
Input
Input 1
Input 2
Output
A breathtaking cinematic aerial shot sweeping over the pyramids at golden hour, with a monarch butterfly gliding through the warm desert air in the foreground, dust particles catching the light, epic scale
1m 43s
View Example

Switch Category

Effects

Text To Image

Text To Image

Text To Video

Large Language Models

Large Language Models

Text To Speech

Text To Speech

Super Resolution

Super Resolution

Lipsync

AI Music Generation

AI Music Generation

Video Editing

Speech To Text

Speech To Text

AI Enhance Videos

Remove Backgrounds

Remove Backgrounds