Skip to main content
The Native Agent Engine (NAE) is MIRA’s default reasoning engine. It talks directly to your configured LLM provider and handles everything a complex, long-running session demands — without you having to manage it.

Architecture

NAE runs as a Python process inside the MIRA application, communicating with the renderer via a typed IPC bridge. On each query, NAE:
  1. Checks the token budget and triggers compaction if needed
  2. Builds the context window (system prompt + history + documents + tool results)
  3. Decides whether to use a single agent or spawn sub-agents
  4. Calls the LLM API directly
  5. Streams the response back to the UI
  6. Updates episodic memory and the token ledger

Context management

Managing a long session’s context is the hardest problem in agentic AI. NAE handles it automatically across four layers:

Token budget

NAE maintains a real-time token ledger accounting for every token committed to the context window:
  • System prompt
  • Conversation history
  • Injected documents
  • Tool call results
  • Reserved output zone
The context budget (default: 120,000 tokens) is the maximum the engine may consume. A separate max output tokens cap (default: 8,000 tokens) is sent with every API call. When utilisation exceeds the compaction warn threshold (default: 75%), NAE triggers proactive compaction before the next call.

Automatic compaction

Two compaction strategies run in sequence as needed:
  1. Sliding window — evicts the oldest turns from the context while preserving the most recent K turns
  2. Selective pruning — identifies and removes low-value content (verbose tool outputs, intermediate reasoning chains that have been superseded) without truncating the timeline

Conversation summarisation

When history grows too long, NAE compresses the oldest turns into a compact narrative summary via a dedicated lightweight LLM call. The summary replaces the raw turns in context. The original turns are preserved in the session database for potential retrieval. The summary is clearly labelled in the context so the model understands its provenance.

Episodic memory

Key facts, decisions, and entities from earlier in the session are extracted and re-injected as a compact “memory” block at the start of each new turn. The agent remembers what matters, even after compaction has removed the original turns.

Multi-agent orchestration

NAE supports three agent modes, configurable in Settings → Engine:
ModeBehaviour
singleAll reasoning performed by one agent. Simpler, lower latency, lower token cost.
multiComplex tasks are decomposed into sub-tasks; each sub-task is handled by an isolated sub-agent running concurrently. An Orchestrator synthesises results.
autoNAE decides based on query complexity. Simple questions use single-agent; multi-domain or parallel-workable tasks spawn sub-agents.

Sub-agent lifecycle (multi/auto mode)

Orchestrator receives query


Decompose into N sub-tasks

   ┌───┼───┐
   ▼   ▼   ▼
 A₁  A₂  A₃   ← sub-agents run concurrently (up to parallelism limit)
   │   │   │
   └───┼───┘

 Orchestrator synthesises results


 Final unified answer → streamed to UI
Configurable limits: max sub-agents (default: 4) and sub-agent parallelism (default: 2).

MCP tool integration

NAE integrates with any connected MCP server. Tool calls are made directly from the engine process. Each call is subject to a configurable MCP tool timeout (default: 10 seconds). Failing tool calls surface an error in the response and do not crash the session.

Provider support

ProviderModels (examples)
OpenAIgpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo
Anthropicclaude-3-5-sonnet, claude-3-5-haiku, claude-3-opus
AWS Bedrockeu/us.anthropic.claude-sonnet-4, claude-3-5-sonnet-v2
Ollamallama3.2, llama3.1, mistral, qwen2.5-coder, deepseek-r1, and any local model

Configuration reference

See NAE Settings Reference for all configurable parameters with their defaults and allowed ranges.