Skip to main content

Knox Memory System — Full Feature Deep Dive

· 16 min read
Knox Anderson
Knox Dev Team

The Knox Memory System (knox-ms) has reached full production readiness. Every subsystem — from autonomous task orchestration to self-healing infrastructure — is now live and fully operational. This post provides a comprehensive tour of every feature, capability, and architectural decision that makes knox-ms the first AI model with a truly unlimited, brain-like memory.

What Is Knox-MS?

Knox-MS is a custom AI model (knox/knox-ms) that provides truly unlimited context window length through an intelligent memory management system modeled after the human brain. Instead of being constrained by a fixed context window like every other LLM, knox-ms orchestrates multiple underlying models through a sophisticated Plan → Task → Memory architecture, dynamically managing what to remember, what to summarize, and what to forget.

Memory System (Brain) = Core Intelligence
↓ manages and updates
Context Cache (LLM) = Working Memory
↓ enhanced by
Vector Embeddings & Rerank (Tool) = Information Retrieval

The result: conversations and projects of any scale — from quick questions to multi-month development efforts — without ever losing context or hitting token limits.

1. Autonomous Orchestration Engine

At the heart of knox-ms is a fully autonomous execution loop driven by LLM reasoning at every stage.

Goal Refinement

When you send a request, the system doesn't just respond — it thinks. The autonomous core decomposes your input through LLM-driven goal analysis, breaking complex requests into a structured goal hierarchy. If a goal is ambiguous, the engine refines it by asking clarifying sub-questions internally before proceeding.

Multi-Task Planning

Once the goal is understood, knox-ms generates a structured execution plan with 1 to 8 parallel or sequential tasks. Each task is classified by:

  • Type — coding, analysis, research, general
  • Difficulty — easy, medium, or hard
  • Dependencies — which tasks must complete before others can start

If planning fails or produces a single monolithic task, the system gracefully falls back to a simpler plan rather than failing outright.

Smart Task System

Tasks aren't static to-do items — they're adaptive execution units with:

  • Priority calculation — dynamically computed based on dependencies, difficulty, and goal relevance
  • Dependency graphs — tasks are ordered and parallelized based on their relationships
  • Adaptive retry with model escalation — if a task fails on a simpler model, it automatically retries with a more capable one
  • Difficulty upgrades — failed or low-quality tasks are re-queued at a higher difficulty tier

State Machine Loop Execution

The autonomous engine runs as a state machine with defined checkpoints. At each iteration:

  1. Execute the next batch of ready tasks
  2. Self-evaluate quality using LLM-based assessment with confidence scoring
  3. Apply corrections if quality is below threshold
  4. Check goal completion through LLM-based achievement assessment
  5. Refine the plan if the goal evaluation identifies additional needed tasks
  6. Create a checkpoint for recovery
  7. Repeat until the goal is fully achieved or iteration limits are reached

This architecture means knox-ms can handle complex, multi-step objectives autonomously — planning, executing, evaluating, correcting, and completing without human intervention at each step.

2. The Memory System — A Brain, Not a Buffer

The Memory System is what makes knox-ms fundamentally different from every other AI model. It provides persistent, organized, self-optimizing memory that works exactly like a human brain.

Human-Like Memory Management

Just as your brain doesn't store every detail of every conversation, knox-ms intelligently manages information through continuous CRUD operations:

  • CREATE — New sessions, plans, task records, and discovered patterns are stored automatically
  • READ — Only relevant memory is loaded into the working context for each request
  • UPDATE — Task results, conversation history, and context summaries are continuously refined
  • DELETE — Temporary files, redundant information, and stale cache entries are cleaned up automatically

Hierarchical Memory Organization

Memory is organized in a tree-structured filesystem with multiple levels of detail:

LevelRetentionDetail
Recent (last few turns)Full detailComplete conversation with all context
Medium-term (current session)Key pointsDecisions, outcomes, important exchanges
Long-term (historical)Semantic summariesPatterns, learned concepts, compressed knowledge

Auto Memory Manager

The automatic memory manager handles the lifecycle of every memory entry:

  • Scoring — each memory entry receives a relevance score that decays over time following Ebbinghaus forgetting curves
  • Retention policies — configurable retention periods with automatic cleanup of entries below the score threshold
  • Deduplication — similar memories are detected and merged to prevent bloat
  • Compression — older memories are compressed while preserving essential information
  • Configurable limits — maximum entries, memory size caps, and cleanup intervals are all tunable

Summarization Engine

Knox-ms includes both extractive and structured summarization capabilities:

  • Extractive summaries pull out the most important sentences and facts from conversation history
  • Structured summaries use LLM calls to produce organized, hierarchical summaries with key decisions, outcomes, and learned patterns
  • Summaries are saved alongside raw history, providing both quick-access overviews and full detail when needed

3. Context Management — Unlimited by Design

The context manager is the bridge between the Memory System and the LLM's working context window.

Multi-Level Context

Rather than dumping everything into a single prompt, knox-ms maintains multiple context levels:

  • Active context — the immediate working set for the current task
  • Session context — broader session history and goals
  • Cross-session context — knowledge and patterns from previous sessions
  • Global knowledge — learned patterns and permanent knowledge base entries

Intelligent Compression

When context grows beyond thresholds, the system applies progressive compression:

  • Older turns are summarized while recent turns remain in full detail
  • The compression ratio is configurable (from minimal to aggressive)
  • Force-compression is available as a self-healing action when memory pressure is high

Context Cache Optimization

Knox-ms structures memory to maximize LLM prompt cache hit rates:

  1. Stable prefix — frequently-used context is placed in a consistent position to enable caching
  2. Dynamic suffix — task-specific context that changes per request is appended after the stable portion
  3. Smart invalidation — the cache is only invalidated when memory updates significantly, not on every change

This means you get the cost and speed benefits of prompt caching while maintaining truly unlimited context depth.

4. Knowledge Graph

Knox-ms builds and maintains a persistent knowledge graph that captures relationships between concepts, code entities, decisions, and patterns.

Graph Structure

  • Knowledge nodes — individual facts, concepts, code patterns, or decisions
  • Relations — typed connections between nodes (depends-on, related-to, supersedes, etc.)
  • Relevance scoring — each node and relation carries a score that is updated based on access patterns and recency

LLM-Powered Knowledge Extraction

When tasks complete, knox-ms uses LLM calls to perform structured knowledge extraction:

  • Each piece of content is analyzed to produce knowledge entries with title, content, tags, and relevance score (0.0–1.0)
  • Extracted knowledge is stored in the graph and made available for future context loading
  • The system learns over time which types of knowledge are most useful

Knox-ms includes a full vector embedding pipeline for semantic search over your project content and conversation history.

Embedding Pipeline

  1. Indexing — project files are chunked (2048 tokens with overlap) and embedded using VoyageAI models (voyage-3.5 for general content, voyage-code-3 for code)
  2. Storage — embeddings are persisted to Knox storage with per-user scoping and lazy-loaded on first access
  3. Search — queries are embedded and matched against stored vectors using cosine similarity
  4. Reranking — top candidates are reranked using VoyageAI's rerank-2.5 model for fine-grained relevance

Resilient Embedding Service

The embedding pipeline is built for production reliability:

  • Retry with exponential backoff — failed API calls retry up to a configurable maximum with exponential delays (500ms × 2^attempt) plus jitter to avoid thundering herd
  • Smart retry filtering — 4xx errors (except 429 rate limits) skip retries since they won't succeed on retry
  • Circuit breaker — after 5 consecutive failures, the system opens a circuit breaker that fast-fails for 60 seconds, preventing cascading failures. A half-open state allows periodic probing, and the circuit closes after 3 consecutive successes
  • Mock fallback — in development environments or when VoyageAI is persistently unavailable, deterministic mock embeddings ensure the system keeps functioning

TTL & Capacity Management

Vector stores are actively maintained:

  • TTL-based eviction — vectors older than the configured maximum age are automatically removed
  • Capacity enforcement — per-user vector limits are enforced using LRU eviction (oldest-first)
  • Cache cleanup — expired embedding cache entries are pruned during maintenance cycles
  • Storage persistence — after any mutation, the cleaned state is persisted to Knox storage asynchronously

6. Self-Healing Infrastructure

Knox-ms doesn't just recover from errors — it diagnoses and repairs itself through 12 distinct healing action types.

Healing Actions

ActionWhat It Does
Switch ModelRoutes to a different underlying model when the current one is failing or underperforming
Fallback to SimplerDowngrades to a simpler, more reliable model for difficult tasks
Clear CacheClears all cached context entries across all sessions to resolve stale data issues
Reduce Context SizeForce-compresses cached context by a configurable percentage to stay within limits
Optimize MemoryTriggers memory cleanup, removing old history and stale memory states
Adjust Batch SizeModifies runtime batch processing parameters
Throttle RequestsApplies request rate limiting during high-load scenarios
Prioritize CacheAdjusts caching priority for better hit rates
+ 4 moreAdditional runtime configuration overrides for edge cases

How Self-Healing Works

When the autonomous loop encounters an error:

  1. The system analyzes the error reason and selects the most appropriate healing action type
  2. The selected action is delegated to the self-manager, which executes real system changes (not simulations)
  3. The loop retries the failed operation with the healing applied
  4. If healing fails, the system can escalate to more aggressive actions

All healing actions are executed through actual subsystem calls — model switching updates the relay's default model, cache clearing operates on the real context manager, and memory cleanup runs against the actual memory store.

Optimization Engine

Beyond error recovery, the self-manager also applies 8 optimization types proactively:

  • Performance optimizations based on observed execution patterns
  • Resource optimizations to reduce memory and token usage
  • Quality optimizations to improve output accuracy
  • Latency optimizations for faster response times

7. Learning & Pattern Recognition

Knox-ms learns from every execution and gets smarter over time.

What It Learns

  • Goal classification — recognizes what type of goal is being requested and suggests proven approaches
  • Model performance tracking — records which models perform best for which task types and difficulties
  • Task type success rates — tracks success/failure rates per task category to improve future planning
  • Approach suggestions — before generating a new plan, the system consults learned patterns and applies model preferences and approach hints from past successes

How Learning Integrates

During the autonomous execution loop:

  • Before planning — the system calls the learning service to get approach suggestions based on the current goal, potentially influencing model selection and task decomposition
  • After execution — success or failure is recorded with full execution data (all tasks, models used, token counts, latencies)
  • Over time — the system builds a database of execution patterns that continuously improve planning quality

Memory Consolidation

A background consolidation task runs periodically to strengthen long-term memory:

  • Ebbinghaus decay — memory entries naturally decay in relevance over time, mimicking human forgetting curves
  • Strengthening — frequently accessed or highly relevant memories are reinforced
  • Deep summarization — old detailed memories are consolidated into compact summaries
  • Knowledge graph updates — new relationships and patterns are integrated into the graph
  • Vector store maintenance — TTL eviction, capacity enforcement, and cache cleanup run as part of each consolidation cycle

The consolidation system auto-detects whether knox-ms is available and runs only when the service is initialized — no manual configuration required.

8. Session Management

Every knox-ms interaction is managed through a robust session system backed by Redis.

Session Features

  • Session state persistence — full session state is stored in Redis with automatic expiry
  • Distributed locks — atomic lock acquisition using Lua scripts (SET key value NX EX ttl) prevents race conditions across multiple processes
  • Safe lock release — ownership is verified atomically before releasing, preventing one process from accidentally releasing another's lock
  • Lock extension — long-running operations can extend their lock TTL with ownership verification
  • Atomic metrics — session metrics (iteration counts, task completions, etc.) use atomic Redis operations to prevent lost updates under concurrency

Checkpointing

The autonomous engine creates checkpoints at configurable intervals during execution:

  • Checkpoints are persisted to Knox storage via the storage integration
  • They capture the full loop state, completed tasks, and current plan
  • On recovery, execution can resume from the last checkpoint rather than starting over
  • Checkpoint endpoints accept session-scoped queries with configurable limits

9. Real-Time Event Streaming

Knox-ms provides full visibility into autonomous execution through Server-Sent Events (SSE).

21 Event Types

The system streams typed events covering every phase of execution:

  • Goal refinement events
  • Planning and task creation events
  • Task start, progress, and completion events
  • Self-evaluation and correction events
  • Healing action events
  • Checkpoint creation events
  • Execution completion events
  • And more

Client Integration

The frontend event service provides:

  • Typed event handling — strongly typed handlers for all 21 event types
  • Automatic reconnection — exponential backoff with jitter on connection loss
  • Last-Event-ID support — seamless reconnection with event replay so no events are missed
  • Auto-close — the connection automatically closes when execution_completed is received

10. Execution Analytics & History

Every autonomous execution is persisted for long-term analytics and review.

What's Recorded

Each execution stores:

  • Execution record — execution ID, session ID, goal, refined goal, goal breakdown, loop state, iterations, config snapshot, final response, timestamps, and cancellation info
  • Task records — individual task ID, plan ID, type, difficulty, status, result/error, token usage, and execution time
  • Aggregate metrics — 14 metrics across 6 categories: performance, tokens, memory, quality, resilience, and latency

Analytics API

Two dedicated endpoints provide historical insights:

  • Execution history — paginated query with status filtering, returning executions with computed duration
  • Aggregated analytics — total/successful executions, success rate, total tasks, per-metric-type averages, and per-task-type success rates

11. Relay Integration & Model Routing

Knox-ms doesn't use a single model — it dynamically routes requests through the Knox relay infrastructure.

How Routing Works

  1. Channel selection — the system selects the best available channel based on model requirements and availability
  2. Adaptor pipeline — requests flow through channel selection → adaptor → convert_request → do_request → do_response
  3. Full billing integration — every relay call is metered and billed correctly
  4. Dynamic model switching — the default model can be changed at runtime (e.g., by self-healing actions that switch to a more reliable model)
  5. Graceful degradation — if the primary relay path fails, fallback mechanisms ensure the request still completes

This means knox-ms can use any model you prefer as its underlying engine while wrapping it with the full memory, planning, and self-healing infrastructure.

12. Storage Architecture

Knox-ms uses a dual-storage architecture for durability and performance.

Knox Storage (Persistent Storage)

  • User and session-scoped memory files
  • Plan and task storage
  • Summaries and indexes
  • Vector embeddings (per-user stores at knox-ms/vectors/user_{id}/store.json)
  • Execution checkpoints
  • Task results and execution summaries

Redis (Fast State)

  • Session state and distributed locks
  • Set-based data structures using native Redis sets (SADD, SMEMBERS, SREM)
  • Atomic operations via Lua scripts for race-free state management
  • Metric counters with atomic increment
  • Local cache fallback when Redis is unavailable

13. Configuration & Admin Controls

Every aspect of knox-ms is configurable through validated API endpoints.

Autonomous Engine Config

Control the execution loop behavior:

  • max_iterations (1–1,000) — maximum autonomous loop iterations
  • max_execution_time_secs (10–86,400) — execution time limit
  • goal_confidence_threshold (0.0–1.0) — minimum confidence to consider a goal achieved
  • max_healing_attempts (0–20) — how many self-healing attempts before giving up
  • max_parallel_tasks (1–50) — concurrency limit for parallel task execution
  • context_window_size (1K–10M) — working context window size
  • checkpoint_interval (1–100) — how often to create recovery checkpoints

Context Config

Fine-tune context management:

  • active_context_window (1K–10M tokens)
  • compression_ratio (0.01–1.0)
  • hierarchy_levels (1–10)
  • retrieval_top_k (1–100)
  • relevance_threshold (0.0–1.0)
  • cross_session_max_age_days (1–3,650)
  • max_graph_entities (100–1M)

Memory Config

Tune the auto memory manager:

  • max_context_tokens (1K–10M)
  • summarize_trigger_tokens (100–1M)
  • knowledge_retention_threshold (0.0–1.0)
  • cleanup_threshold_days (1–3,650)
  • dedup_similarity_threshold (0.0–1.0)

User Preferences

Individual users can customize their autonomous execution experience:

  • Maximum iterations and time limits
  • Confidence thresholds
  • Checkpoint intervals
  • All validated with the same bounds as admin configs

All configuration is persisted to the database and loaded at startup, so your settings survive restarts.

14. Frontend Experience

The knox-ms frontend provides a rich, interactive interface for interacting with and managing the memory system.

Visual Architecture

  • Brain Memory Architecture — a visual representation of the memory system's hierarchical structure
  • Knox-MS Panel — the main interaction panel available in both Chat and Code views
  • Vector Search UI — search and explore your project's semantic embeddings
  • Session Manager — view, manage, and switch between active sessions
  • Memory Explorer — browse the memory tree, inspect entries, view scores and retention
  • Task System UI — monitor active tasks, view plans, and track execution progress

Autonomous Settings

Users can configure their autonomous execution preferences directly from the UI:

  • Settings are loaded on component mount from the backend
  • Changes are saved to the backend with real-time validation
  • Toast notifications confirm successful saves or report errors

Localization

Full internationalization support with i18n strings for all knox-ms UI elements.

15. Type Safety Across the Stack

Knox-ms maintains end-to-end type safety from backend to frontend.

Automated Type Synchronization

A synchronization script automatically generates TypeScript interfaces from the Rust backend:

  • 19 interfaces and 2 union types are generated from 5 backend source files
  • Type mappings handle all Rust → TypeScript conversions: Stringstring, Option<T>T | null, Vec<T>T[], HashMap<K,V>Record<K,V>, and more
  • Rust doc comments (///) are preserved as TypeScript JSDoc (/** */)
  • The script is configurable — adding new types is as simple as adding entries to a source list

This ensures the frontend and backend never drift out of sync on data structures.

API Quick Reference

Model Identity

{
"id": "knox/knox-ms",
"object": "model",
"owned_by": "KnoxChat",
"context_length": -1
}

A context_length of -1 means unlimited — there is no upper bound.

Special Parameters

ParameterTypeDescription
session_idstringUnique session identifier for memory persistence
project_idstringProject identifier for vector embeddings retrieval
enable_vector_searchbooleanEnable semantic search over project content (default: true)
vector_top_kintegerNumber of vector search candidates to retrieve (default: 30)
rerank_thresholdfloatMinimum rerank score threshold, 0.0–1.0 (default: 0.5)
memory_modestringMemory strategy: full, summarized, selective
include_reasoningbooleanInclude task planning reasoning in response
verbositystringOutput detail level: minimal, normal, verbose

35+ REST Endpoints

Knox-ms exposes a comprehensive REST API covering:

  • Autonomous execution management (start, cancel, status, history, analytics)
  • Memory operations (explore, search, cleanup)
  • Knowledge graph queries
  • Vector search and indexing
  • Session management
  • Checkpoint operations (list, restore, delete with proper session scoping)
  • Configuration management (engine, context, memory, user preferences)
  • Real-time event streaming (SSE)

Summary

Knox-MS is not an incremental improvement — it's a fundamentally new approach to AI interaction. By combining autonomous orchestration, brain-like memory, self-healing infrastructure, continuous learning, and production-grade reliability, knox-ms delivers something no other model can: truly unlimited context with intelligence that grows over time.

Every subsystem is fully operational, battle-tested, and ready for production workloads. Whether you're using knox-ms for a quick question or a months-long development project, the system remembers, learns, adapts, and improves with every interaction.

Start using Knox-MS today — select knox/knox-ms as your model and experience unlimited context for yourself.

>>> Knox-MS Unlimited Context Theorem