How it works
Why code execution matters
When a language model answers a quantitative question without executing code, it generates a statistically likely answer — which may or may not be numerically correct. RLM removes that ambiguity: the Python interpreter computes the answer, not the model’s internal statistics. What this means in practice:- Calculations are computed, not inferred
- Cross-referenced data is matched by code, not by pattern matching
- Statistical results come from actual
codecalls, not approximations - If the code raises an exception, the engine sees the actual error and corrects its approach
Iteration limit
RLM iterates until either:- The answer is verified (the code runs successfully and produces a result the model judges correct)
- The max iterations limit is reached (default: 30)
Execution environment
Python code executes inside the bundled Python virtualenv (mira-venv/) on your local machine. There is no sandbox — the code runs with the same OS user permissions as the MIRA process:
- It can read and write any file your OS user can access
- It can make outbound network calls
- It can import any library installed in the virtualenv
Provider support
| Provider | Models (examples) |
|---|---|
| AWS Bedrock | eu/us.anthropic.claude-sonnet-4, claude-3-5-sonnet-v2, claude-3-5-haiku |
| Anthropic | claude-3-5-sonnet, claude-3-5-haiku, claude-3-opus |
| OpenAI | gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo |
| Ollama | llama3.2, mistral, qwen2.5-coder, deepseek-r1, and any local model |
eu.anthropic.claude-sonnet-4-20250514-v1:0
Concurrency
RLM supports up to 3 concurrent sessions by default (configurable). If you send a query while 3 sessions are already in flight, MIRA shows a concurrency-limit error and queues the request.Configuration reference
See RLM Settings Reference for all configurable parameters with defaults and allowed ranges.Best used for
- Complex data analysis — “analyse this CSV and find the top 3 factors driving churn”
- Multi-step calculations — financial modelling, statistical tests, simulations
- Cross-referencing — “find every row in this dataset that contradicts this policy document”
- Anything where being provably correct matters more than being fast