Hermes Agent Mixture of Agents 2.0: Combine GPT, Claude, DeepSeek

You can already run Hermes Agent with Claude, GPT, or DeepSeek. Now you can run all three in the same session, and the agent decides how each one contributes.

Nous Research just shipped Mixture of Agents 2.0 in Hermes Agent, and it’s the most practical multi-model feature I have seen in an open-source agent framework. You build presets that pair reference models with an aggregator, and the whole thing looks like a normal model to the agent. One /moa command, and your session runs across providers.

Introducing Mixture of Agents 2.0 in Hermes Agent.

Combine any provider's models into a mixture of your own. Access your presets as if it were a normal model in Hermes.

Big improvement in our soon-to-release HermesBench against opus and gpt-5.5 with MoA using Opus & GPT… https://t.co/nQnBAkEm0M
— Teknium 🪽 (@Teknium) June 26, 2026

Here’s what shipped, how it works, and why it matters for anyone running AI agents on their own stack.

What Mixture of Agents Actually Does

MoA lets you configure a preset with one or more reference models and an aggregator model. Each time the agent needs to respond, it runs the references first to get their perspective on the conversation, then sends that analysis to the aggregator, which writes the actual response and makes tool calls.

The default preset ships with:

Reference models: GPT-5.5 (OpenAI) and DeepSeek V4 Pro (DeepSeek)
Aggregator: Claude Opus 4.8 (Anthropic)

You pick the preset through /model review --provider moa or just /moa, and it shows up as a normal model on every Hermes surface: CLI, TUI, gateway (Telegram, Discord, etc.), and Desktop. No custom routing code. No middleware. No wrapper scripts.

The Numbers

On HermesBench (their upcoming internal benchmark), the default MoA preset scores 0.8202. For comparison:

Claude Opus 4.8 alone: 0.7607
GPT-5.5 alone: 0.7412

That’s about 8% higher than Opus alone and roughly 11% higher than GPT-5.5 alone. The configuration beats its strongest component by a solid margin, which confirms the multi-model perspective lift is real — not just averaging two models together.

How It Works Under the Hood

Hermes runs the MoA process fresh on every agent iteration. The reference models get a stripped-down version of the conversation: system prompt and tool transcript removed, just user/assistant text. That keeps reference calls cheaper and avoids provider-level rejections from strict services.

The reference outputs are appended as private context for the aggregator. Then the aggregator runs with Hermes’s full tool schema, makes tool calls, and Hermes executes those tools normally. On the next iteration, the whole process repeats over the updated conversation.

A few design decisions I like:

Prompt caching preserved. The reference outputs are appended to the end of the latest user turn, not spliced into the middle of history. That means the stable prefix stays cached, and only the freshly appended tail is new context. No extra cache cost beyond the reference model calls themselves.

Failures are graceful. If one reference model goes down or hits an auth error, Hermes includes the failure message in the reference context and keeps going with whatever models returned. You don’t lose a whole session because one provider had a hiccup.

Recursive MoA blocked. An aggregator cannot be another MoA preset. The system explicitly prevents nested mixture trees, which makes sense — the combinatorial explosion would be expensive and hard to debug.

Nous Research’s official post says MoA presets give users “capabilities beyond the publicly available frontier” — 8% higher than Opus 4.8 and 11% higher than GPT-5.5.

Configuration and Setup

You configure MoA presets through the dashboard, the Desktop app settings, or directly in config.yaml. The config format is straightforward:

moa:
  default_preset: default
  presets:
    default:
      reference_models:
        - provider: openai-codex
          model: gpt-5.5
        - provider: openrouter
          model: deepseek/deepseek-v4-pro
      aggregator:
        provider: openrouter
        model: anthropic/claude-opus-4.8
      reference_temperature: 0.6
      aggregator_temperature: 0.4
      max_tokens: 4096
      enabled: true

You can mix providers freely. The reference models don’t need to be from the same provider as the aggregator or each other. You can run Claude as the aggregator over GPT and DeepSeek references, swap in a local model as one reference, or point references at different OpenRouter endpoints.

The /moa slash command lets you switch presets mid-session: /moa review to pick a named preset, or /moa with a prompt for a one-shot MoA call. It’s one of the quality-of-life touches that makes this feel like a first-class feature rather than an add-on.

Why This Matters for Agent Builders

The open-source agent space has been converging on the idea that one model is not enough for hard tasks. The approaches so far have been jury-rigged: pipeline scripts, multi-step chains, synchronous calls with manual routing. What Hermes Agent just shipped is a clean abstraction that lives inside the normal agent loop.

If you’re already running Hermes Agent, this is a config change and a /moa away. If you’re not, the v0.17.0 release from last week already made Hermes dramatically faster, and MoA 2.0 is the kind of feature that justifies revisiting the stack.

A few things to keep in mind:

MoA increases model-call count. Every iteration is N reference calls plus the aggregator. That means higher token costs per session compared to running a single model.

The benchmark numbers are HermesBench, not a published third-party benchmark. Take the exact percentages with reasonable skepticism, but the directional improvement across two different reference models is credible.

The docs went live today, June 26. This is fresh, and the feature may evolve as real-world usage reveals edge cases.

Still, this is the best implementation of multi-model agent orchestration I have seen in an open-source project. It’s not bolted on. It’s a provider in the model system, which means every surface, every tool, every feature of Hermes works with it automatically.

You can read the official MoA documentation for the full spec, or check out the Hermes Agent GitHub repo to get started.

If you have been running Hermes and want to see how the /learn command turns source material into reusable skills on top of this, that’s a solid next read. And yes, you can use MoA presets with skills, pets, subagents, and everything else — that’s the point.

Bottom Line

Mixture of Agents 2.0 in Hermes Agent is a production-grade multi-model agent feature that actually works like a normal model. No custom infrastructure, no chain-of-thought engineering, no glue code. Configure, /moa, and your agent runs across providers.

Hermes Agent Just Got Mixture of Agents 2.0, and Now You Can Combine GPT, Claude, and DeepSeek in One Session

What Mixture of Agents Actually Does

The Numbers

How It Works Under the Hood

Configuration and Setup

Why This Matters for Agent Builders

Bottom Line

Submit a Take Cancel reply

What Mixture of Agents Actually Does

The Numbers

How It Works Under the Hood

Configuration and Setup

Why This Matters for Agent Builders

Bottom Line

Submit a Take Cancel reply

Related signals

Anthropic Can Deploy Mythos 5 Again — But Only to US Critical Infrastructure

Google AI Studio’s New ‘Design Variations’ Button Generates UI Layouts Instantly

Google Antigravity 2.2.1 Adds a Built-In Guide Skill, Audio Rendering, and Better File Search