You can already run Hermes Agent with Claude, GPT, or DeepSeek. Now you can run all three in the same session, and the agent decides how each one contributes.
Nous Research just shipped Mixture of Agents 2.0 in Hermes Agent, and it’s the most practical multi-model feature I have seen in an open-source agent framework. You build presets that pair reference models with an aggregator, and the whole thing looks like a normal model to the agent. One /moa command, and your session runs across providers.
Here’s what shipped, how it works, and why it matters for anyone running AI agents on their own stack.
What Mixture of Agents Actually Does
MoA lets you configure a preset with one or more reference models and an aggregator model. Each time the agent needs to respond, it runs the references first to get their perspective on the conversation, then sends that analysis to the aggregator, which writes the actual response and makes tool calls.
The default preset ships with:
- Reference models: GPT-5.5 (OpenAI) and DeepSeek V4 Pro (DeepSeek)
- Aggregator: Claude Opus 4.8 (Anthropic)
You pick the preset through /model review --provider moa or just /moa, and it shows up as a normal model on every Hermes surface: CLI, TUI, gateway (Telegram, Discord, etc.), and Desktop. No custom routing code. No middleware. No wrapper scripts.
The Numbers
On HermesBench (their upcoming internal benchmark), the default MoA preset scores 0.8202. For comparison:
- Claude Opus 4.8 alone: 0.7607
- GPT-5.5 alone: 0.7412
That’s about 8% higher than Opus alone and roughly 11% higher than GPT-5.5 alone. The configuration beats its strongest component by a solid margin, which confirms the multi-model perspective lift is real — not just averaging two models together.
How It Works Under the Hood
Hermes runs the MoA process fresh on every agent iteration. The reference models get a stripped-down version of the conversation: system prompt and tool transcript removed, just user/assistant text. That keeps reference calls cheaper and avoids provider-level rejections from strict services.
The reference outputs are appended as private context for the aggregator. Then the aggregator runs with Hermes’s full tool schema, makes tool calls, and Hermes executes those tools normally. On the next iteration, the whole process repeats over the updated conversation.
A few design decisions I like:
- Prompt caching preserved. The reference outputs are appended to the end of the latest user turn, not spliced into the middle of history. That means the stable prefix stays cached, and only the freshly appended tail is new context. No extra cache cost beyond the reference model calls themselves.
- Failures are graceful. If one reference model goes down or hits an auth error, Hermes includes the failure message in the reference context and keeps going with whatever models returned. You don’t lose a whole session because one provider had a hiccup.
- Recursive MoA blocked. An aggregator cannot be another MoA preset. The system explicitly prevents nested mixture trees, which makes sense — the combinatorial explosion would be expensive and hard to debug.
Nous Research’s official post says MoA presets give users “capabilities beyond the publicly available frontier” — 8% higher than Opus 4.8 and 11% higher than GPT-5.5.
Configuration and Setup
You configure MoA presets through the dashboard, the Desktop app settings, or directly in config.yaml. The config format is straightforward:
moa:
default_preset: default
presets:
default:
reference_models:
- provider: openai-codex
model: gpt-5.5
- provider: openrouter
model: deepseek/deepseek-v4-pro
aggregator:
provider: openrouter
model: anthropic/claude-opus-4.8
reference_temperature: 0.6
aggregator_temperature: 0.4
max_tokens: 4096
enabled: trueYou can mix providers freely. The reference models don’t need to be from the same provider as the aggregator or each other. You can run Claude as the aggregator over GPT and DeepSeek references, swap in a local model as one reference, or point references at different OpenRouter endpoints.
The /moa slash command lets you switch presets mid-session: /moa review to pick a named preset, or /moa with a prompt for a one-shot MoA call. It’s one of the quality-of-life touches that makes this feel like a first-class feature rather than an add-on.
Why This Matters for Agent Builders
The open-source agent space has been converging on the idea that one model is not enough for hard tasks. The approaches so far have been jury-rigged: pipeline scripts, multi-step chains, synchronous calls with manual routing. What Hermes Agent just shipped is a clean abstraction that lives inside the normal agent loop.
If you’re already running Hermes Agent, this is a config change and a /moa away. If you’re not, the v0.17.0 release from last week already made Hermes dramatically faster, and MoA 2.0 is the kind of feature that justifies revisiting the stack.
A few things to keep in mind:
- MoA increases model-call count. Every iteration is N reference calls plus the aggregator. That means higher token costs per session compared to running a single model.
- The benchmark numbers are HermesBench, not a published third-party benchmark. Take the exact percentages with reasonable skepticism, but the directional improvement across two different reference models is credible.
- The docs went live today, June 26. This is fresh, and the feature may evolve as real-world usage reveals edge cases.
Still, this is the best implementation of multi-model agent orchestration I have seen in an open-source project. It’s not bolted on. It’s a provider in the model system, which means every surface, every tool, every feature of Hermes works with it automatically.
You can read the official MoA documentation for the full spec, or check out the Hermes Agent GitHub repo to get started.
If you have been running Hermes and want to see how the /learn command turns source material into reusable skills on top of this, that’s a solid next read. And yes, you can use MoA presets with skills, pets, subagents, and everything else — that’s the point.
Bottom Line
Mixture of Agents 2.0 in Hermes Agent is a production-grade multi-model agent feature that actually works like a normal model. No custom infrastructure, no chain-of-thought engineering, no glue code. Configure, /moa, and your agent runs across providers.



