Send Images to Your AI Coding Agent: Screenshots, Diagrams, and More

Text is great for describing what you want. But sometimes you just want to show the agent what you are seeing. A bug in the UI. A design mock from Figma. An architecture diagram from a whiteboard session.

muxd now supports image attachments. Paste or upload an image directly into your conversation, and the agent can see it, analyze it, and write code based on what it sees.

When Images Beat Text

Some problems are just easier to show:

UI bugs: "The button is overlapping the navbar" takes three sentences to describe. A screenshot takes one paste.
Design implementations: Share a mock and ask the agent to build it. No need to describe hex codes and padding values.
Error messages: A screenshot of your terminal with the full stack trace is faster than copying each line.
Architecture reviews: Drop in a diagram and ask for feedback. The agent can read Mermaid, draw.io exports, or even whiteboard photos.
Documentation screenshots: Show the agent what a third-party tool looks like so it can guide you through it.

How It Works

In the terminal, muxd accepts image paths:

Paste an image path: /path/to/screenshot.png

On mobile, it is even simpler. The muxd mobile app lets you:

Tap the attachment button
Choose from photo library or take a new photo
Add your prompt alongside the image
Send

The image gets encoded and sent to your configured model. If you are using a vision-capable model like Claude Sonnet 4.6, GPT-4o, or GLM-4.6, the agent responds with full visual understanding.

Supported Models

Image support depends on your provider. These work out of the box:

Provider	Model	Vision Support
Anthropic	Claude Sonnet 4.6, Claude Opus 4.6	Yes
OpenAI	GPT-4o, GPT-4.1	Yes
Google	Gemini 2.0 Pro	Yes
xAI	Grok 2 Vision	Yes
Z.AI	GLM-4.6, GLM-5	Yes
Fireworks	Llama 3.2 Vision	Yes
Ollama	LLaVA, BakLLaVA	Yes

If you attach an image to a non-vision model, muxd will let you know. Switch models mid-session with /model and try again.

Real Example: Fixing a CSS Bug

You are building a dashboard. The sidebar looks fine on desktop but breaks on mobile. Instead of typing out a description:

Open the site on your phone
Take a screenshot of the broken layout
Open muxd mobile
Attach the screenshot
Say: "Fix this responsive layout issue"

The agent sees the problem, identifies the CSS causing the overflow, and writes the fix. You review, apply, and deploy. Total time: two minutes.

Real Example: Building from a Mock

You have a Figma design for a new settings page. Export it as PNG, drop it into muxd:

Here's the design for the settings page. Build it using the existing component library.

The agent reads the mock, identifies the components, matches your existing patterns, and generates the code. You get a first draft that actually looks like the design instead of a text-based approximation.

Privacy Note

Images are sent to your configured model provider, just like text messages. If privacy is a concern:

Use a local model via Ollama with vision support
Redact sensitive information before attaching
Remember that images are stored in your local muxd database for conversation history

Works Everywhere

Image support is not just a mobile feature. In the terminal, you can paste paths to local files. In the TUI, images display inline where supported. On mobile, the full camera roll is available.

Same agent. Same tools. Just add vision.

The Bottom Line

You already communicate with teammates using screenshots and diagrams. Now you can do the same with your AI agent. Show instead of tell. Get faster, more accurate responses. Build things that actually match what you imagined.