SAM 2 Video is a video object segmentation tool that tracks any object you click on, frame by frame, through an entire video clip. The core problem it solves is the manual, time-consuming work of isolating moving objects: instead of masking each frame by hand, you mark a starting point on the first frame and the model handles the rest. This makes it practical for video editors, VFX artists, and data labelers who work with large amounts of footage. The model offers three mask types to fit different workflows: a clean binary mask for cutout work, a colored highlight overlay for visual review, and a greenscreen-compatible output ready for compositing without extra post-processing. You can also request bounding box annotations, either alone or alongside pixel-level masks, and export results as a video file or an image sequence in WebP, JPG, or PNG. Multi-object tracking lets you click multiple subjects in a single clip and have each tracked independently. Whether you are building a training dataset, pulling subjects out of interview recordings, or preparing footage for a visual effects pass, SAM 2 Video fits directly into existing pipelines with no special software required. On Picasso IA, there are no per-generation credits or usage quotas, so you can process entire batch jobs without watching a limit counter.
SAM 2 Video is a video object segmentation model that tracks anything you click on across every frame of a clip. On Picasso IA, you upload a video, click the objects you want isolated, and the model returns precise per-frame masks or bounding boxes. Imagine you filmed a product demo and need to pull the item out of the background: instead of masking each frame manually in an editor, you click once and the model does the tracking for you. It handles multiple objects per clip and delivers output as a video file or a frame-by-frame image sequence.
Do I need programming skills or technical knowledge to use this? No, just open SAM 2 Video on Picasso IA, adjust the settings you want, and hit generate.
Is it free to try? Yes, SAM 2 Video is available at no cost and there are no credit limits on how many clips you can process.
How long does it take to get results? Processing time depends on the length and resolution of your video. Short clips typically finish in under a minute.
What output formats are supported? You can export a video file or an image sequence in WebP, JPG, or PNG format. For sequences, you can also control compression quality with a 0-100 slider.
Can I track more than one object at a time? Yes. You can click multiple objects and assign each a separate label; the model tracks all of them through the clip in a single run.
How many times can I run the model? There are no usage caps. You can run SAM 2 Video as many times as you need without hitting any quota or paywall.
Where can I use the outputs? The masks and annotated frames work in any video editor, compositing application, or machine learning dataset pipeline that accepts standard image or video formats.
Each generation consumes 2 credits
2 credits
or 10 credits for 5 generations
With Elite or Infinite plans, enjoy unlimited generations with this model at no additional cost.
Everything this model can do for you
Point at any object in a video frame and the model tracks it through every subsequent frame.
Choose binary, colored highlight, or greenscreen output to match your editing workflow.
Add rectangular annotations around tracked objects, alone or combined with a pixel-level mask.
Export as a video file or an image sequence in WebP, JPG, or PNG at adjustable quality levels.
Assign separate labels to multiple clicked objects and track them all in a single pass.
Skip every Nth frame on export to reduce file size without rerunning the segmentation.
Process as many video clips as you want on Picasso IA with no credit caps or usage quotas.
Ideal for editing, analysis, and creative tasks