Smart Compression

Long sessions accumulate context quickly. Left unchecked, the conversation grows until it hits the model's context window limit and further progress becomes impossible. muxd handles this with a three-tier compression strategy that removes the least important content first, preserving what matters for continued work.

The Problem with a Single Threshold

Earlier versions of muxd triggered a single LLM-based compaction when the context reached a fixed size. This had two failure modes:

  • Too aggressive — compaction at a low threshold wasted money and discarded information that was still useful.
  • Too lossy — a single summary pass often dropped important details about prior decisions, partially completed work, and known constraints.

Smart Compression replaces this with three graduated tiers.

The Three Tiers

Tier 1 — Trim Tool Results (60k tokens)

When the context exceeds 60 000 tokens, muxd compresses old tool results. Long file_read outputs, command outputs, and search results are replaced with a short placeholder:

[tool result trimmed — 4 832 tokens]

The tool call itself and the agent's response to it are kept intact. Only the raw output is trimmed. This is cheap and lossless for reasoning purposes — the agent remembers what it did even if the raw output is gone.

Tier 2 — Archive Old Turns (75k tokens)

If the context still exceeds 75 000 tokens after Tier 1, the oldest conversation turns (user messages and assistant responses) are removed from the active window and archived. The most recent turns are always preserved. No LLM call is needed for this tier.

Tier 3 — LLM Summary (90k tokens)

If the context still exceeds 90 000 tokens, muxd calls a cheap model to produce a structured summary of the archived turns. The summary is injected at the top of the conversation as a system message. It is focused on:

  • Decisions made — what approaches were chosen and why
  • Files touched — which files were created or modified
  • Current plan — what the agent was working toward
  • Known constraints — things that were ruled out or must be respected
  • Errors encountered — failures and how they were resolved

This prompt structure means the summary retains actionable information rather than a generic recap.

Compact Model

Tier 3 uses a separate, cheaper model to keep costs low. The default is the cheapest available model in your provider (Haiku for Anthropic, gpt-4o-mini for OpenAI, etc.).

To use a different model:

/config set model.compact claude-haiku-3-5

The primary agent model is never used for compaction.

Comparison to the Old Approach

AspectOld (single threshold)Smart Compression
Tiers13
First actionLLM summaryTrim tool results (cheap)
LLM callAlways at thresholdOnly at 90k, after cheaper tiers
Information lostHigh — single passLow — graduated removal
CostHigherLower for most sessions