DeepSeek V4 Adds Vision: Free Multimodal AI That Challenges GPT-4o

Back in May 2025, when I reviewed DeepSeek AI, I called out the biggest missing piece: “No image generation or visual recognition support.” It was the one thing keeping DeepSeek from being a true GPT-4o competitor. Today, that gap is gone.

What DeepSeek Vision Can Do

DeepSeek has silently rolled out native vision/image understanding to its chat interface at chat.deepseek.com. Upload an image and the AI will analyze, describe, and understand it — not extract text, but truly see what’s in the picture.

Early user reports on Hacker News (252 points in 7 hours) confirm the feature works well. Users describe it as “really good and fast” for screenshots, photos, and oddball test images. The AI can identify objects, read scenes, describe compositions, and answer questions about image content — all the things that make a multimodal model genuinely useful.

What It Can’t Do (Yet)

Let’s be clear: this is image understanding, not image generation. DeepSeek Vision can’t create images from prompts — that is still a separate capability. If you need DALL-E-style generation, you’re still looking at GPT-4o or specialized tools. But for analyzing and understanding images, DeepSeek is now in the game.

The V4 Context

Vision appears to ship as part of DeepSeek V4, which the company calls a “preview version” on its homepage banner. V4 brings more than just vision:

1 million token context — matching the biggest context windows in the industry
Thinking mode — chain-of-thought reasoning for complex problems
Improved agent capabilities — better tool use and multi-step task handling
Massive output capacity — up to 384K tokens per response

DeepSeek also announced that legacy model names deepseek-chat and deepseek-reasoner will be deprecated on July 24, 2026, in favor of the V4 model names.

Pricing: Free vs. API

The chat interface is completely free — no subscription required. For developers, API pricing is aggressive:

deepseek-v4-flash: $0.14 per million input tokens, $0.28 per million output
deepseek-v4-pro: $0.435 per million input tokens, $0.87 per million output

Compare that to GPT-4o at roughly $2.50 per million input tokens. DeepSeek undercuts by 10-20x on the Flash tier. There is one catch though: vision is NOT yet available through the API. The DeepSeek API docs list the V4 models but do not include any image content type or vision endpoint. For now, this is a chat-only feature. If you need programmatic vision access, you’ll have to wait or use a competitor.

How This Changes DeepSeek’s Competitive Position

This is a big deal. DeepSeek was the strongest free alternative to ChatGPT for text tasks — great at coding, logic, and creative writing. But the missing vision support was a dealbreaker for anyone needing multimodal analysis. With V4, DeepSeek now ticks that box.

For users who don’t want a $20/month ChatGPT Plus subscription or $10/month Claude Pro, DeepSeek V4 with Vision is now the most capable free alternative. It handles screenshots, photos, diagrams, and other image-based queries at zero cost. That changes the value equation significantly.

I Called This

In my DeepSeek AI Review, I explicitly listed “No image generation or visual recognition support” as a Con. DeepSeek just fixed that weakness. I’ve updated that review to reflect the change — the Cons list no longer includes that bullet, and there’s a note at the top pointing to this coverage.

The review’s overall score (4/5) now reads differently. Without the vision gap, DeepSeek is a stronger recommendation for anyone exploring AI tools.

A Quiet but Significant Launch

Notably, DeepSeek has made no formal announcement about this feature. There’s no blog post, no tweet from @deepseek_ai (their account went silent in January 2025), and no press release. The rollout happened via the chat interface, and the Hacker News community effectively served as the announcement.

This kind of silent launch is becoming typical for DeepSeek. The company ships to the web interface first, gathers community reaction, and documents things retroactively. It’s a different approach from the splashy launches we see from OpenAI and Google, but the feature quality appears to speak for itself.

Bottom Line

DeepSeek V4 with Vision closes the biggest gap in the platform. It’s now a genuine multimodal competitor available for free, with aggressive API pricing for those who need it. The API gap (no programmatic vision access yet) is a real limitation, but for everyday image analysis in the chat interface, DeepSeek V4 is worth a look — especially if you’ve been priced out of GPT-4o or Claude.

I’ll update this piece as DeepSeek rolls out API-level vision support or makes further announcements.

What DeepSeek Vision Can Do

What It Can’t Do (Yet)

The V4 Context

Pricing: Free vs. API

How This Changes DeepSeek’s Competitive Position

I Called This

A Quiet but Significant Launch

Bottom Line

Submit a Take Cancel reply

Related signals

The White House Wants Anthropic to Build Jailbreak-Proof AI. Security Experts Say That’s Impossible.

Midjourney Medical Scanner: The AI Company’s First Hardware Is a Full-Body Imaging Device

OpenAI Is Losing $20.92 Billion a Year — Leaked Financials, Explained