Google dropped a pair of new generative media models today that cover pretty much the full pipeline from image generation to video editing.
Nano Banana 2 Lite is Google’s fastest and cheapest image model yet. It delivers text-to-image in 4 seconds at $0.034 per 1K resolution image. Gemini Omni Flash brings high-quality video generation and conversational editing to developers for the first time, priced at $0.10 per second of output. Both are available today in Google AI Studio, the Gemini API, and the Gemini Enterprise Agent Platform.
Google is positioning them as complementary tools you can chain together: generate an image with Nano Banana 2 Lite, then animate it into a video with Omni Flash.
Nano Banana 2 Lite: speed and cost first
Nano Banana 2 Lite (model name gemini-3.1-flash-lite-image) is built for rapid ideation and high-volume pipelines where latency and budget are the primary constraints. Google says it’s the recommended replacement for anyone still using the first-gen Nano Banana (gemini-2.5-flash-image).
Key specs:
- Text-to-image in 4 seconds
- $0.034 per 1K resolution image
- Strong prompt adherence with good character consistency and in-image text rendering
It sits at the bottom of the Nano Banana family, with Nano Banana 2 (the generalist workhorse) and Nano Banana Pro (for complex professional use) above it.
Beyond developer platforms, Nano Banana 2 Lite is rolling into Google consumer surfaces too, including AI Mode in Search, the Gemini app, NotebookLM, Google Photos, Stitch, Google Flow, and Google Ads.
Gemini Omni Flash: video generation for developers
Gemini Omni Flash (gemini-omni-flash-preview) is the model Google previewed at I/O that brings Gemini’s multimodal reasoning into video generation and editing.
It supports:
- Video generation from text, image, and video inputs
- Conversational editing, so you can refine videos with natural language
- Multimodal referencing (combine images, text, and video for scene consistency)
- $0.10 per second of output (same pricing as Veo 3.1 Fast)
There are some important limitations to note since this is a preview release: – Maximum 10-second video generations (longer clips are coming) – Audio reference uploading and scene extension are not supported yet – Video references over 3 seconds are not correctly processed yet – Character consistency across scene changes has some limitations
Chaining them together
The real differentiator is how these models work together. Google released three demo apps to show the pipeline:
- Anywhere: upload a selfie, Nano Banana 2 Lite transports you to landmarks, Omni Flash animates the result
- Space Lift: interior design app that generates room concepts and animates them into walkthrough videos
- Omni Product Studio: converts static product images into cinematic e-commerce videos
Using the Interactions API, developers can chain up to three sequential edits while maintaining session history and context.
This is a follow-up to the Gemini Omni guide I wrote a few weeks ago when Google first showed the concept at I/O. Now it’s actually shippable.



