Knox Memory System — Full Feature Deep Dive
The Knox Memory System (knox-ms) has reached full production readiness. Every subsystem — from autonomous task orchestration to self-healing infrastructure — is now live and fully operational. This post provides a comprehensive tour of every feature, capability, and architectural decision that makes knox-ms the first AI model with a truly unlimited, brain-like memory.
What Is Knox-MS?
Knox-MS is a custom AI model (knox/knox-ms) that provides truly unlimited context window length through an intelligent memory management system modeled after the human brain. Instead of being constrained by a fixed context window like every other LLM, knox-ms orchestrates multiple underlying models through a sophisticated Plan → Task → Memory architecture, dynamically managing what to remember, what to summarize, and what to forget.
Memory System (Brain) = Core Intelligence
↓ manages and updates
Context Cache (LLM) = Working Memory
↓ enhanced by
Vector Embeddings & Rerank (Tool) = Information Retrieval
The result: conversations and projects of any scale — from quick questions to multi-month development efforts — without ever losing context or hitting token limits.
1. Autonomous Orchestration Engine
At the heart of knox-ms is a fully autonomous execution loop driven by LLM reasoning at every stage.
Goal Refinement
When you send a request, the system doesn't just respond — it thinks. The autonomous core decomposes your input through LLM-driven goal analysis, breaking complex requests into a structured goal hierarchy. If a goal is ambiguous, the engine refines it by asking clarifying sub-questions internally before proceeding.
Multi-Task Planning
Once the goal is understood, knox-ms generates a structured execution plan with 1 to 8 parallel or sequential tasks. Each task is classified by:
- Type — coding, analysis, research, general
- Difficulty — easy, medium, or hard
- Dependencies — which tasks must complete before others can start
If planning fails or produces a single monolithic task, the system gracefully falls back to a simpler plan rather than failing outright.
Smart Task System
Tasks aren't static to-do items — they're adaptive execution units with:
- Priority calculation — dynamically computed based on dependencies, difficulty, and goal relevance
- Dependency graphs — tasks are ordered and parallelized based on their relationships
- Adaptive retry with model escalation — if a task fails on a simpler model, it automatically retries with a more capable one
- Difficulty upgrades — failed or low-quality tasks are re-queued at a higher difficulty tier
State Machine Loop Execution
The autonomous engine runs as a state machine with defined checkpoints. At each iteration:
- Execute the next batch of ready tasks
- Self-evaluate quality using LLM-based assessment with confidence scoring
- Apply corrections if quality is below threshold
- Check goal completion through LLM-based achievement assessment
- Refine the plan if the goal evaluation identifies additional needed tasks
- Create a checkpoint for recovery
- Repeat until the goal is fully achieved or iteration limits are reached
This architecture means knox-ms can handle complex, multi-step objectives autonomously — planning, executing, evaluating, correcting, and completing without human intervention at each step.
2. The Memory System — A Brain, Not a Buffer
The Memory System is what makes knox-ms fundamentally different from every other AI model. It provides persistent, organized, self-optimizing memory that works exactly like a human brain.
Human-Like Memory Management
Just as your brain doesn't store every detail of every conversation, knox-ms intelligently manages information through continuous CRUD operations:
- CREATE — New sessions, plans, task records, and discovered patterns are stored automatically
- READ — Only relevant memory is loaded into the working context for each request
- UPDATE — Task results, conversation history, and context summaries are continuously refined
- DELETE — Temporary files, redundant information, and stale cache entries are cleaned up automatically
Hierarchical Memory Organization
Memory is organized in a tree-structured filesystem with multiple levels of detail:
| Level | Retention | Detail |
|---|---|---|
| Recent (last few turns) | Full detail | Complete conversation with all context |
| Medium-term (current session) | Key points | Decisions, outcomes, important exchanges |
| Long-term (historical) | Semantic summaries | Patterns, learned concepts, compressed knowledge |
Auto Memory Manager
The automatic memory manager handles the lifecycle of every memory entry:
- Scoring — each memory entry receives a relevance score that decays over time following Ebbinghaus forgetting curves
- Retention policies — configurable retention periods with automatic cleanup of entries below the score threshold
- Deduplication — similar memories are detected and merged to prevent bloat
- Compression — older memories are compressed while preserving essential information
- Configurable limits — maximum entries, memory size caps, and cleanup intervals are all tunable
Summarization Engine
Knox-ms includes both extractive and structured summarization capabilities:
- Extractive summaries pull out the most important sentences and facts from conversation history
- Structured summaries use LLM calls to produce organized, hierarchical summaries with key decisions, outcomes, and learned patterns
- Summaries are saved alongside raw history, providing both quick-access overviews and full detail when needed
3. Context Management — Unlimited by Design
The context manager is the bridge between the Memory System and the LLM's working context window.
Multi-Level Context
Rather than dumping everything into a single prompt, knox-ms maintains multiple context levels:
- Active context — the immediate working set for the current task
- Session context — broader session history and goals
- Cross-session context — knowledge and patterns from previous sessions
- Global knowledge — learned patterns and permanent knowledge base entries
Intelligent Compression
When context grows beyond thresholds, the system applies progressive compression:
- Older turns are summarized while recent turns remain in full detail
- The compression ratio is configurable (from minimal to aggressive)
- Force-compression is available as a self-healing action when memory pressure is high
Context Cache Optimization
Knox-ms structures memory to maximize LLM prompt cache hit rates:
- Stable prefix — frequently-used context is placed in a consistent position to enable caching
- Dynamic suffix — task-specific context that changes per request is appended after the stable portion
- Smart invalidation — the cache is only invalidated when memory updates significantly, not on every change
This means you get the cost and speed benefits of prompt caching while maintaining truly unlimited context depth.
4. Knowledge Graph
Knox-ms builds and maintains a persistent knowledge graph that captures relationships between concepts, code entities, decisions, and patterns.
Graph Structure
- Knowledge nodes — individual facts, concepts, code patterns, or decisions
- Relations — typed connections between nodes (depends-on, related-to, supersedes, etc.)
- Relevance scoring — each node and relation carries a score that is updated based on access patterns and recency
LLM-Powered Knowledge Extraction
When tasks complete, knox-ms uses LLM calls to perform structured knowledge extraction:
- Each piece of content is analyzed to produce knowledge entries with title, content, tags, and relevance score (0.0–1.0)
- Extracted knowledge is stored in the graph and made available for future context loading
- The system learns over time which types of knowledge are most useful
5. Vector Embeddings & Semantic Search
Knox-ms includes a full vector embedding pipeline for semantic search over your project content and conversation history.
Embedding Pipeline
- Indexing — project files are chunked (2048 tokens with overlap) and embedded using VoyageAI models (
voyage-3.5for general content,voyage-code-3for code) - Storage — embeddings are persisted to Knox storage with per-user scoping and lazy-loaded on first access
- Search — queries are embedded and matched against stored vectors using cosine similarity
- Reranking — top candidates are reranked using VoyageAI's
rerank-2.5model for fine-grained relevance
Resilient Embedding Service
The embedding pipeline is built for production reliability:
- Retry with exponential backoff — failed API calls retry up to a configurable maximum with exponential delays (500ms × 2^attempt) plus jitter to avoid thundering herd
- Smart retry filtering — 4xx errors (except 429 rate limits) skip retries since they won't succeed on retry
- Circuit breaker — after 5 consecutive failures, the system opens a circuit breaker that fast-fails for 60 seconds, preventing cascading failures. A half-open state allows periodic probing, and the circuit closes after 3 consecutive successes
- Mock fallback — in development environments or when VoyageAI is persistently unavailable, deterministic mock embeddings ensure the system keeps functioning
TTL & Capacity Management
Vector stores are actively maintained:
- TTL-based eviction — vectors older than the configured maximum age are automatically removed
- Capacity enforcement — per-user vector limits are enforced using LRU eviction (oldest-first)
- Cache cleanup — expired embedding cache entries are pruned during maintenance cycles
- Storage persistence — after any mutation, the cleaned state is persisted to Knox storage asynchronously
6. Self-Healing Infrastructure
Knox-ms doesn't just recover from errors — it diagnoses and repairs itself through 12 distinct healing action types.
Healing Actions
| Action | What It Does |
|---|---|
| Switch Model | Routes to a different underlying model when the current one is failing or underperforming |
| Fallback to Simpler | Downgrades to a simpler, more reliable model for difficult tasks |
| Clear Cache | Clears all cached context entries across all sessions to resolve stale data issues |
| Reduce Context Size | Force-compresses cached context by a configurable percentage to stay within limits |
| Optimize Memory | Triggers memory cleanup, removing old history and stale memory states |
| Adjust Batch Size | Modifies runtime batch processing parameters |
| Throttle Requests | Applies request rate limiting during high-load scenarios |
| Prioritize Cache | Adjusts caching priority for better hit rates |
| + 4 more | Additional runtime configuration overrides for edge cases |
How Self-Healing Works
When the autonomous loop encounters an error:
- The system analyzes the error reason and selects the most appropriate healing action type
- The selected action is delegated to the self-manager, which executes real system changes (not simulations)
- The loop retries the failed operation with the healing applied
- If healing fails, the system can escalate to more aggressive actions
All healing actions are executed through actual subsystem calls — model switching updates the relay's default model, cache clearing operates on the real context manager, and memory cleanup runs against the actual memory store.
Optimization Engine
Beyond error recovery, the self-manager also applies 8 optimization types proactively:
- Performance optimizations based on observed execution patterns
- Resource optimizations to reduce memory and token usage
- Quality optimizations to improve output accuracy
- Latency optimizations for faster response times
7. Learning & Pattern Recognition
Knox-ms learns from every execution and gets smarter over time.
What It Learns
- Goal classification — recognizes what type of goal is being requested and suggests proven approaches
- Model performance tracking — records which models perform best for which task types and difficulties
- Task type success rates — tracks success/failure rates per task category to improve future planning
- Approach suggestions — before generating a new plan, the system consults learned patterns and applies model preferences and approach hints from past successes
How Learning Integrates
During the autonomous execution loop:
- Before planning — the system calls the learning service to get approach suggestions based on the current goal, potentially influencing model selection and task decomposition
- After execution — success or failure is recorded with full execution data (all tasks, models used, token counts, latencies)
- Over time — the system builds a database of execution patterns that continuously improve planning quality
Memory Consolidation
A background consolidation task runs periodically to strengthen long-term memory:
- Ebbinghaus decay — memory entries naturally decay in relevance over time, mimicking human forgetting curves
- Strengthening — frequently accessed or highly relevant memories are reinforced
- Deep summarization — old detailed memories are consolidated into compact summaries
- Knowledge graph updates — new relationships and patterns are integrated into the graph
- Vector store maintenance — TTL eviction, capacity enforcement, and cache cleanup run as part of each consolidation cycle
The consolidation system auto-detects whether knox-ms is available and runs only when the service is initialized — no manual configuration required.
8. Session Management
Every knox-ms interaction is managed through a robust session system backed by Redis.
Session Features
- Session state persistence — full session state is stored in Redis with automatic expiry
- Distributed locks — atomic lock acquisition using Lua scripts (
SET key value NX EX ttl) prevents race conditions across multiple processes - Safe lock release — ownership is verified atomically before releasing, preventing one process from accidentally releasing another's lock
- Lock extension — long-running operations can extend their lock TTL with ownership verification
- Atomic metrics — session metrics (iteration counts, task completions, etc.) use atomic Redis operations to prevent lost updates under concurrency
Checkpointing
The autonomous engine creates checkpoints at configurable intervals during execution:
- Checkpoints are persisted to Knox storage via the storage integration
- They capture the full loop state, completed tasks, and current plan
- On recovery, execution can resume from the last checkpoint rather than starting over
- Checkpoint endpoints accept session-scoped queries with configurable limits
9. Real-Time Event Streaming
Knox-ms provides full visibility into autonomous execution through Server-Sent Events (SSE).
21 Event Types
The system streams typed events covering every phase of execution:
- Goal refinement events
- Planning and task creation events
- Task start, progress, and completion events
- Self-evaluation and correction events
- Healing action events
- Checkpoint creation events
- Execution completion events
- And more
Client Integration
The frontend event service provides:
- Typed event handling — strongly typed handlers for all 21 event types
- Automatic reconnection — exponential backoff with jitter on connection loss
Last-Event-IDsupport — seamless reconnection with event replay so no events are missed- Auto-close — the connection automatically closes when
execution_completedis received
10. Execution Analytics & History
Every autonomous execution is persisted for long-term analytics and review.
What's Recorded
Each execution stores:
- Execution record — execution ID, session ID, goal, refined goal, goal breakdown, loop state, iterations, config snapshot, final response, timestamps, and cancellation info
- Task records — individual task ID, plan ID, type, difficulty, status, result/error, token usage, and execution time
- Aggregate metrics — 14 metrics across 6 categories: performance, tokens, memory, quality, resilience, and latency
Analytics API
Two dedicated endpoints provide historical insights:
- Execution history — paginated query with status filtering, returning executions with computed duration
- Aggregated analytics — total/successful executions, success rate, total tasks, per-metric-type averages, and per-task-type success rates
11. Relay Integration & Model Routing
Knox-ms doesn't use a single model — it dynamically routes requests through the Knox relay infrastructure.
How Routing Works
- Channel selection — the system selects the best available channel based on model requirements and availability
- Adaptor pipeline — requests flow through
channel selection → adaptor → convert_request → do_request → do_response - Full billing integration — every relay call is metered and billed correctly
- Dynamic model switching — the default model can be changed at runtime (e.g., by self-healing actions that switch to a more reliable model)
- Graceful degradation — if the primary relay path fails, fallback mechanisms ensure the request still completes
This means knox-ms can use any model you prefer as its underlying engine while wrapping it with the full memory, planning, and self-healing infrastructure.
12. Storage Architecture
Knox-ms uses a dual-storage architecture for durability and performance.
Knox Storage (Persistent Storage)
- User and session-scoped memory files
- Plan and task storage
- Summaries and indexes
- Vector embeddings (per-user stores at
knox-ms/vectors/user_{id}/store.json) - Execution checkpoints
- Task results and execution summaries
Redis (Fast State)
- Session state and distributed locks
- Set-based data structures using native Redis sets (
SADD,SMEMBERS,SREM) - Atomic operations via Lua scripts for race-free state management
- Metric counters with atomic increment
- Local cache fallback when Redis is unavailable
13. Configuration & Admin Controls
Every aspect of knox-ms is configurable through validated API endpoints.
Autonomous Engine Config
Control the execution loop behavior:
max_iterations(1–1,000) — maximum autonomous loop iterationsmax_execution_time_secs(10–86,400) — execution time limitgoal_confidence_threshold(0.0–1.0) — minimum confidence to consider a goal achievedmax_healing_attempts(0–20) — how many self-healing attempts before giving upmax_parallel_tasks(1–50) — concurrency limit for parallel task executioncontext_window_size(1K–10M) — working context window sizecheckpoint_interval(1–100) — how often to create recovery checkpoints
Context Config
Fine-tune context management:
active_context_window(1K–10M tokens)compression_ratio(0.01–1.0)hierarchy_levels(1–10)retrieval_top_k(1–100)relevance_threshold(0.0–1.0)cross_session_max_age_days(1–3,650)max_graph_entities(100–1M)
Memory Config
Tune the auto memory manager:
max_context_tokens(1K–10M)summarize_trigger_tokens(100–1M)knowledge_retention_threshold(0.0–1.0)cleanup_threshold_days(1–3,650)dedup_similarity_threshold(0.0–1.0)
User Preferences
Individual users can customize their autonomous execution experience:
- Maximum iterations and time limits
- Confidence thresholds
- Checkpoint intervals
- All validated with the same bounds as admin configs
All configuration is persisted to the database and loaded at startup, so your settings survive restarts.
14. Frontend Experience
The knox-ms frontend provides a rich, interactive interface for interacting with and managing the memory system.
Visual Architecture
- Brain Memory Architecture — a visual representation of the memory system's hierarchical structure
- Knox-MS Panel — the main interaction panel available in both Chat and Code views
- Vector Search UI — search and explore your project's semantic embeddings
- Session Manager — view, manage, and switch between active sessions
- Memory Explorer — browse the memory tree, inspect entries, view scores and retention
- Task System UI — monitor active tasks, view plans, and track execution progress
Autonomous Settings
Users can configure their autonomous execution preferences directly from the UI:
- Settings are loaded on component mount from the backend
- Changes are saved to the backend with real-time validation
- Toast notifications confirm successful saves or report errors
Localization
Full internationalization support with i18n strings for all knox-ms UI elements.
15. Type Safety Across the Stack
Knox-ms maintains end-to-end type safety from backend to frontend.
Automated Type Synchronization
A synchronization script automatically generates TypeScript interfaces from the Rust backend:
- 19 interfaces and 2 union types are generated from 5 backend source files
- Type mappings handle all Rust → TypeScript conversions:
String→string,Option<T>→T | null,Vec<T>→T[],HashMap<K,V>→Record<K,V>, and more - Rust doc comments (
///) are preserved as TypeScript JSDoc (/** */) - The script is configurable — adding new types is as simple as adding entries to a source list
This ensures the frontend and backend never drift out of sync on data structures.
API Quick Reference
Model Identity
{
"id": "knox/knox-ms",
"object": "model",
"owned_by": "KnoxChat",
"context_length": -1
}
A context_length of -1 means unlimited — there is no upper bound.
Special Parameters
| Parameter | Type | Description |
|---|---|---|
session_id | string | Unique session identifier for memory persistence |
project_id | string | Project identifier for vector embeddings retrieval |
enable_vector_search | boolean | Enable semantic search over project content (default: true) |
vector_top_k | integer | Number of vector search candidates to retrieve (default: 30) |
rerank_threshold | float | Minimum rerank score threshold, 0.0–1.0 (default: 0.5) |
memory_mode | string | Memory strategy: full, summarized, selective |
include_reasoning | boolean | Include task planning reasoning in response |
verbosity | string | Output detail level: minimal, normal, verbose |
35+ REST Endpoints
Knox-ms exposes a comprehensive REST API covering:
- Autonomous execution management (start, cancel, status, history, analytics)
- Memory operations (explore, search, cleanup)
- Knowledge graph queries
- Vector search and indexing
- Session management
- Checkpoint operations (list, restore, delete with proper session scoping)
- Configuration management (engine, context, memory, user preferences)
- Real-time event streaming (SSE)
Summary
Knox-MS is not an incremental improvement — it's a fundamentally new approach to AI interaction. By combining autonomous orchestration, brain-like memory, self-healing infrastructure, continuous learning, and production-grade reliability, knox-ms delivers something no other model can: truly unlimited context with intelligence that grows over time.
Every subsystem is fully operational, battle-tested, and ready for production workloads. Whether you're using knox-ms for a quick question or a months-long development project, the system remembers, learns, adapts, and improves with every interaction.
Start using Knox-MS today — select knox/knox-ms as your model and experience unlimited context for yourself.