Send Images to Your AI Coding Agent: Screenshots, Diagrams, and More
Text is great for describing what you want. But sometimes you just want to show the agent what you are seeing. A bug in the UI. A design mock from Figma. An architecture diagram from a whiteboard session.
muxd now supports image attachments. Paste or upload an image directly into your conversation, and the agent can see it, analyze it, and write code based on what it sees.
When Images Beat Text
Some problems are just easier to show:
- UI bugs: "The button is overlapping the navbar" takes three sentences to describe. A screenshot takes one paste.
- Design implementations: Share a mock and ask the agent to build it. No need to describe hex codes and padding values.
- Error messages: A screenshot of your terminal with the full stack trace is faster than copying each line.
- Architecture reviews: Drop in a diagram and ask for feedback. The agent can read Mermaid, draw.io exports, or even whiteboard photos.
- Documentation screenshots: Show the agent what a third-party tool looks like so it can guide you through it.
How It Works
In the terminal, muxd accepts image paths:
Paste an image path: /path/to/screenshot.png
On mobile, it is even simpler. The muxd mobile app lets you:
- Tap the attachment button
- Choose from photo library or take a new photo
- Add your prompt alongside the image
- Send
The image gets encoded and sent to your configured model. If you are using a vision-capable model like Claude Sonnet 4.6, GPT-4o, or GLM-4.6, the agent responds with full visual understanding.
Supported Models
Image support depends on your provider. These work out of the box:
| Provider | Model | Vision Support |
|---|---|---|
| Anthropic | Claude Sonnet 4.6, Claude Opus 4.6 | Yes |
| OpenAI | GPT-4o, GPT-4.1 | Yes |
| Gemini 2.0 Pro | Yes | |
| xAI | Grok 2 Vision | Yes |
| Z.AI | GLM-4.6, GLM-5 | Yes |
| Fireworks | Llama 3.2 Vision | Yes |
| Ollama | LLaVA, BakLLaVA | Yes |
If you attach an image to a non-vision model, muxd will let you know. Switch models mid-session with /model and try again.
Real Example: Fixing a CSS Bug
You are building a dashboard. The sidebar looks fine on desktop but breaks on mobile. Instead of typing out a description:
- Open the site on your phone
- Take a screenshot of the broken layout
- Open muxd mobile
- Attach the screenshot
- Say: "Fix this responsive layout issue"
The agent sees the problem, identifies the CSS causing the overflow, and writes the fix. You review, apply, and deploy. Total time: two minutes.
Real Example: Building from a Mock
You have a Figma design for a new settings page. Export it as PNG, drop it into muxd:
Here's the design for the settings page. Build it using the existing component library.
The agent reads the mock, identifies the components, matches your existing patterns, and generates the code. You get a first draft that actually looks like the design instead of a text-based approximation.
Privacy Note
Images are sent to your configured model provider, just like text messages. If privacy is a concern:
- Use a local model via Ollama with vision support
- Redact sensitive information before attaching
- Remember that images are stored in your local muxd database for conversation history
Works Everywhere
Image support is not just a mobile feature. In the terminal, you can paste paths to local files. In the TUI, images display inline where supported. On mobile, the full camera roll is available.
Same agent. Same tools. Just add vision.
The Bottom Line
You already communicate with teammates using screenshots and diagrams. Now you can do the same with your AI agent. Show instead of tell. Get faster, more accurate responses. Build things that actually match what you imagined.