Back in May 2025, when I reviewed DeepSeek AI, I called out the biggest missing piece: “No image generation or visual recognition support.” It was the one thing keeping DeepSeek from being a true GPT-4o competitor. Today, that gap is gone.
What DeepSeek Vision Can Do
DeepSeek has silently rolled out native vision/image understanding to its chat interface at chat.deepseek.com. Upload an image and the AI will analyze, describe, and understand it — not extract text, but truly see what’s in the picture.
Early user reports on Hacker News (252 points in 7 hours) confirm the feature works well. Users describe it as “really good and fast” for screenshots, photos, and oddball test images. The AI can identify objects, read scenes, describe compositions, and answer questions about image content — all the things that make a multimodal model genuinely useful.
What It Can’t Do (Yet)
Let’s be clear: this is image understanding, not image generation. DeepSeek Vision can’t create images from prompts — that is still a separate capability. If you need DALL-E-style generation, you’re still looking at GPT-4o or specialized tools. But for analyzing and understanding images, DeepSeek is now in the game.
The V4 Context
Vision appears to ship as part of DeepSeek V4, which the company calls a “preview version” on its homepage banner. V4 brings more than just vision:
- 1 million token context — matching the biggest context windows in the industry
- Thinking mode — chain-of-thought reasoning for complex problems
- Improved agent capabilities — better tool use and multi-step task handling
- Massive output capacity — up to 384K tokens per response
DeepSeek also announced that legacy model names deepseek-chat and deepseek-reasoner will be deprecated on July 24, 2026, in favor of the V4 model names.
Pricing: Free vs. API
The chat interface is completely free — no subscription required. For developers, API pricing is aggressive:
- deepseek-v4-flash: $0.14 per million input tokens, $0.28 per million output
- deepseek-v4-pro: $0.435 per million input tokens, $0.87 per million output
Compare that to GPT-4o at roughly $2.50 per million input tokens. DeepSeek undercuts by 10-20x on the Flash tier. There is one catch though: vision is NOT yet available through the API. The DeepSeek API docs list the V4 models but do not include any image content type or vision endpoint. For now, this is a chat-only feature. If you need programmatic vision access, you’ll have to wait or use a competitor.
How This Changes DeepSeek’s Competitive Position
This is a big deal. DeepSeek was the strongest free alternative to ChatGPT for text tasks — great at coding, logic, and creative writing. But the missing vision support was a dealbreaker for anyone needing multimodal analysis. With V4, DeepSeek now ticks that box.
For users who don’t want a $20/month ChatGPT Plus subscription or $10/month Claude Pro, DeepSeek V4 with Vision is now the most capable free alternative. It handles screenshots, photos, diagrams, and other image-based queries at zero cost. That changes the value equation significantly.
I Called This
In my DeepSeek AI Review, I explicitly listed “No image generation or visual recognition support” as a Con. DeepSeek just fixed that weakness. I’ve updated that review to reflect the change — the Cons list no longer includes that bullet, and there’s a note at the top pointing to this coverage.
The review’s overall score (4/5) now reads differently. Without the vision gap, DeepSeek is a stronger recommendation for anyone exploring AI tools.
A Quiet but Significant Launch
Notably, DeepSeek has made no formal announcement about this feature. There’s no blog post, no tweet from @deepseek_ai (their account went silent in January 2025), and no press release. The rollout happened via the chat interface, and the Hacker News community effectively served as the announcement.
This kind of silent launch is becoming typical for DeepSeek. The company ships to the web interface first, gathers community reaction, and documents things retroactively. It’s a different approach from the splashy launches we see from OpenAI and Google, but the feature quality appears to speak for itself.
Bottom Line
DeepSeek V4 with Vision closes the biggest gap in the platform. It’s now a genuine multimodal competitor available for free, with aggressive API pricing for those who need it. The API gap (no programmatic vision access yet) is a real limitation, but for everyday image analysis in the chat interface, DeepSeek V4 is worth a look — especially if you’ve been priced out of GPT-4o or Claude.
I’ll update this piece as DeepSeek rolls out API-level vision support or makes further announcements.


