Architecture
NAE runs as a Python process inside the MIRA application, communicating with the renderer via a typed IPC bridge. On each query, NAE:- Checks the token budget and triggers compaction if needed
- Builds the context window (system prompt + history + documents + tool results)
- Decides whether to use a single agent or spawn sub-agents
- Calls the LLM API directly
- Streams the response back to the UI
- Updates episodic memory and the token ledger
Context management
Managing a long session’s context is the hardest problem in agentic AI. NAE handles it automatically across four layers:Token budget
NAE maintains a real-time token ledger accounting for every token committed to the context window:- System prompt
- Conversation history
- Injected documents
- Tool call results
- Reserved output zone
Automatic compaction
Two compaction strategies run in sequence as needed:- Sliding window — evicts the oldest turns from the context while preserving the most recent K turns
- Selective pruning — identifies and removes low-value content (verbose tool outputs, intermediate reasoning chains that have been superseded) without truncating the timeline
Conversation summarisation
When history grows too long, NAE compresses the oldest turns into a compact narrative summary via a dedicated lightweight LLM call. The summary replaces the raw turns in context. The original turns are preserved in the session database for potential retrieval. The summary is clearly labelled in the context so the model understands its provenance.Episodic memory
Key facts, decisions, and entities from earlier in the session are extracted and re-injected as a compact “memory” block at the start of each new turn. The agent remembers what matters, even after compaction has removed the original turns.Multi-agent orchestration
NAE supports three agent modes, configurable in Settings → Engine:| Mode | Behaviour |
|---|---|
| single | All reasoning performed by one agent. Simpler, lower latency, lower token cost. |
| multi | Complex tasks are decomposed into sub-tasks; each sub-task is handled by an isolated sub-agent running concurrently. An Orchestrator synthesises results. |
| auto | NAE decides based on query complexity. Simple questions use single-agent; multi-domain or parallel-workable tasks spawn sub-agents. |
Sub-agent lifecycle (multi/auto mode)
MCP tool integration
NAE integrates with any connected MCP server. Tool calls are made directly from the engine process. Each call is subject to a configurable MCP tool timeout (default: 10 seconds). Failing tool calls surface an error in the response and do not crash the session.Provider support
| Provider | Models (examples) |
|---|---|
| OpenAI | gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo |
| Anthropic | claude-3-5-sonnet, claude-3-5-haiku, claude-3-opus |
| AWS Bedrock | eu/us.anthropic.claude-sonnet-4, claude-3-5-sonnet-v2 |
| Ollama | llama3.2, llama3.1, mistral, qwen2.5-coder, deepseek-r1, and any local model |