Skip to main content

Knox-MS 1.0 Developer Documentation

Base URL: https://api.knox.chat
API Version: v1

Knox-MS is an AI orchestration engine with a human-brain-inspired memory architecture with hierarchical memory levels, autonomous execution, and intelligent context management that enables effectively unlimited context windows and persistent memory across sessions.

Table of Contents

Quick Start

# 1. Get your API key from the Knox dashboard → Settings → API Keys

# 2. Make your first request
curl https://api.knox.chat/v1/chat/completions \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "knox/knox-ms",
"messages": [{"role": "user", "content": "Hello!"}]
}'

That's it. Knox-MS is fully OpenAI-compatible — any OpenAI SDK, library, or tool works out of the box. Knox.chat

Authentication

All API requests require a Bearer token. Create API keys from the Knox dashboard under Settings → API Keys.

Authorization: Bearer sk-xxxxxxxxxxxxxxxxxxxx

Pass the key in the Authorization header for every request. Knox.chat

Chat Completions

Basic Request

POST /v1/chat/completions

Send a list of messages and receive a model-generated response.

Request body:

FieldTypeRequiredDescription
modelstringYesModel ID. Use knox/knox-ms for the Knox-MS engine, or any other available model (e.g., anthropic/claude-sonnet-4.6, openai/gpt-4o).
messagesarrayYesList of message objects with role and content.
streambooleanNoIf true, partial message deltas are sent as SSE events. Default false.
max_tokensintegerNoMaximum tokens to generate.
temperaturenumberNoSampling temperature (0–2). Default varies by model.
top_pnumberNoNucleus sampling parameter (0–1).
frequency_penaltynumberNoPenalizes repeated tokens (−2 to 2).
presence_penaltynumberNoPenalizes tokens already present (−2 to 2).
stopstring/arrayNoStop sequence(s).
toolsarrayNoList of tool/function definitions the model may call.
tool_choicestring/objectNoControls tool use: "auto", "none", "required", or a specific function.
response_formatobjectNoForce response format (e.g., {"type": "json_object"}).
seedintegerNoSeed for deterministic output.

Example:

{
"model": "knox/knox-ms",
"messages": [
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to calculate fibonacci numbers"}
],
"temperature": 0.7,
"max_tokens": 2048
}

Response:

{
"id": "chatcmpl-abc123def456",
"object": "chat.completion",
"created": 1740422400,
"model": "knox/knox-ms",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Here's an efficient fibonacci function..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 28,
"completion_tokens": 150,
"total_tokens": 178
}
}

Streaming

Set "stream": true to receive the response as Server-Sent Events. Each event contains a data: line with a JSON chunk.

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1740422400,"model":"knox/knox-ms","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1740422400,"model":"knox/knox-ms","choices":[{"index":0,"delta":{"content":"Here's"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1740422400,"model":"knox/knox-ms","choices":[{"index":0,"delta":{"content":" an"},"finish_reason":null}]}

...

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1740422400,"model":"knox/knox-ms","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Knox-MS Model

The knox/knox-ms model is a meta-model that automatically:

  1. Plans — Decomposes your request into a set of tasks
  2. Routes — Sends each task to the best model based on difficulty (fast models for simple tasks, powerful models for complex ones)
  3. Remembers — Persists conversation context to memory, so you can have conversations that span far beyond any single model's context window
  4. Learns — Records successful patterns and improves over time

It is accessed through the same /v1/chat/completions endpoint as any other model. There is nothing special you need to do — just set "model": "knox/knox-ms".

Knox-MS Parameters

When using knox/knox-ms, you can pass additional parameters in a knox_ms object at the top level of your request body:

{
"model": "knox/knox-ms",
"messages": [...],
"knox_ms": {
"session_id": "my-project-session",
"memory_mode": "summarized",
"verbosity": "verbose",
"include_reasoning": true,
"use_vector_search": true,
"extract_knowledge": true
}
}
ParameterTypeDefaultDescription
session_idstringauto-generatedPersistent session ID. Use the same ID across requests to maintain conversation context.
memory_modestring"summarized"How context is managed: "full" (keep everything), "summarized" (compress older context), "selective" (only relevant context).
verbositystring"normal"Response detail level: "minimal", "normal", "verbose".
include_reasoningbooleanfalseInclude planning/reasoning steps in the response.
use_vector_searchbooleanfalseEnable semantic vector search over past sessions for relevant context.
vector_top_kinteger30Number of vector search candidates to retrieve.
rerank_thresholdnumber0.5Minimum relevance score for vector search results (0.0–1.0).
max_context_tokensintegerOverride the context window size for this request.
force_modelstringOverride auto-routing and use a specific model for all tasks.
task_difficultystringOverride auto-detection: "easy", "medium", "hard".
extract_knowledgebooleanfalseExtract key facts and concepts into the knowledge base.
final_onlybooleanfalseReturn only the final result, not intermediate task outputs.
project_idstringProject ID for scoped vector embeddings retrieval.
temperaturenumberPassed through to underlying models.
top_pnumberPassed through to underlying models.
toolsarrayTool/function definitions passed through to underlying models.
tool_choicestring/objectTool choice strategy passed through to underlying models.

Knox-MS Metadata in Responses

When using knox/knox-ms, responses include an additional knox_ms_meta field:

{
"id": "chatcmpl-abc123",
"model": "knox/knox-ms",
"choices": [...],
"usage": {...},
"knox_ms_meta": {
"session_id": "my-project-session",
"plan_id": "plan-xyz",
"plan_description": "Implement fibonacci function with memoization",
"current_task": "write_function",
"task_status": "completed",
"tasks_completed": 1,
"tasks_total": 1,
"tasks_failed": 0,
"total_model_calls": 2,
"models_used": {"anthropic/claude-sonnet-4.6": 1, "anthropic/claude-haiku-4.5": 1},
"memory_mode": "summarized",
"memory_tokens_saved": 5000,
"context_tokens_used": 1200,
"execution_time_ms": 3200,
"vector_search_results": 0,
"summary_updated": false
}
}
FieldDescription
session_idThe session used for this request
plan_idID of the generated execution plan
tasks_completed / tasks_total / tasks_failedTask execution summary
models_usedMap of model → number of calls
memory_tokens_savedTokens saved via summarization/compression
context_tokens_usedTokens used for context in this request
execution_time_msTotal processing time
vector_search_resultsNumber of relevant past context chunks retrieved

Models

List Models

GET /v1/models

Returns a list of all models currently available to your account.

Response:

{
"object": "list",
"data": [
{
"id": "knox/knox-ms",
"object": "model",
"created": 1740422400,
"owned_by": "knox"
},
{
"id": "anthropic/claude-sonnet-4.6",
"object": "model",
"created": 1740422400,
"owned_by": "anthropic"
}
]
}

Retrieve Model

GET /v1/models/{model_id}

Returns details about a specific model. Knox.chat

Embeddings

POST /v1/embeddings

Generate vector embeddings for text input.

Request body:

FieldTypeRequiredDescription
modelstringYesEmbedding model ID (e.g., voyage-4-lite, text-embedding-3-small).
inputstring/arrayYesText(s) to embed.

Example:

{
"model": "voyage-4-lite",
"input": "Knox-MS is an AI orchestration engine."
}

Response:

{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0123, -0.0456, ...]
}
],
"model": "voyage-4-lite",
"usage": {
"prompt_tokens": 8,
"total_tokens": 8
}
}

Knox.chat

Image Generation

POST /v1/images/generations

Generate images from a text prompt.

Request body:

FieldTypeRequiredDescription
modelstringNoImage model ID.
promptstringYesText description of the image.
nintegerNoNumber of images to generate.
sizestringNoImage size (e.g., 1024x1024).

Audio

Transcription

POST /v1/audio/transcriptions

Transcribe audio to text. Accepts multipart/form-data.

Translation

POST /v1/audio/translations

Translate audio to English text.

Text-to-Speech

POST /v1/audio/speech

Generate audio from text input.

Request body:

FieldTypeRequiredDescription
modelstringYesTTS model ID.
inputstringYesText to convert to speech.
voicestringYesVoice ID.

Moderations

POST /v1/moderations

Classify text for content policy violations. Knox.chat

Reranking

POST /v1/rerank

Rerank a list of documents by relevance to a query (VoyageAI-compatible).

Request body:

FieldTypeRequiredDescription
modelstringYesRerank model ID (e.g., rerank-2.5).
querystringYesThe query to rank documents against.
documentsarrayYesList of document strings to rerank.
top_nintegerNoNumber of top results to return.

Anthropic Messages API

Knox provides full compatibility with the Anthropic Messages API, so tools like Claude Code and the Anthropic Python/JS SDK work natively.

POST /v1/messages

Setup for Claude Code:

export ANTHROPIC_BASE_URL="https://api.knox.chat"
export ANTHROPIC_API_KEY="sk-your-knox-api-key"

Setup for the Anthropic Python SDK:

import anthropic

client = anthropic.Anthropic(
base_url="https://api.knox.chat/v1",
api_key="sk-your-knox-api-key",
)

message = client.messages.create(
model="anthropic/claude-sonnet-4.6",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude!"}
],
)
print(message.content[0].text)

The endpoint accepts the full Anthropic request format (separate system field, max_tokens required, content blocks, tool use, extended thinking) and returns Anthropic-formatted responses. Knox.chat

Autonomous Execution

For complex, multi-step tasks, Knox-MS can run an autonomous execution loop that iteratively plans, executes, evaluates, and refines until your goal is achieved — with real-time progress streaming.

All autonomous endpoints require authentication and are under /api/knox-ms/autonomous.

Start Execution

POST /api/knox-ms/autonomous/execute

Request body:

FieldTypeRequiredDescription
messagestringYesThe goal or task to accomplish.
session_idstringNoSession ID for context persistence. Auto-generated if omitted.
stream_eventsbooleanNoEnable real-time SSE event streaming. Default true.
config.max_iterationsintegerNoMaximum execution iterations (safety limit).
config.max_time_secsintegerNoMaximum execution time in seconds.
config.confidence_thresholdnumberNoConfidence level required to consider the goal complete (0–1).
config.enable_checkpointingbooleanNoEnable periodic state checkpoints for recovery.
config.checkpoint_intervalintegerNoCheckpoint every N iterations.
config.enable_smart_tasksbooleanNoEnable smart task decomposition.
config.enable_adaptive_retrybooleanNoEnable automatic retry with adapted strategy on failure.

Example:

{
"message": "Analyze this codebase and produce a comprehensive architecture document",
"session_id": "arch-review-session",
"config": {
"max_iterations": 50,
"max_time_secs": 1800,
"confidence_threshold": 0.85,
"enable_checkpointing": true
}
}

Response:

{
"execution_id": "exec-a1b2c3d4e5f6",
"session_id": "arch-review-session",
"status": "running",
"result": null,
"error": null,
"events_url": "/api/knox-ms/autonomous/arch-review-session/events"
}

Stream Events (SSE)

GET /api/knox-ms/autonomous/{session_id}/events

Returns a Server-Sent Events stream with real-time progress updates. Connect to this URL to receive events as the execution progresses.

Event types:

EventDescription
execution_startedExecution has begun, includes plan overview.
plan_updatedA new or revised plan was generated.
task_startedA task began execution, includes model and difficulty.
task_progressProgress update for a long-running task.
task_content_chunkStreaming content from a task (incremental output).
task_completedA task finished, includes token usage and timing.
task_failedA task failed, includes error and retry info.
task_evaluatedQuality evaluation of a completed task.
memory_operationA memory operation was performed (summarize, archive, etc.).
context_updatedSession context was updated or compressed.
knowledge_extractedKnowledge entries were extracted from results.
checkpoint_createdExecution state was saved.
progress_updateOverall progress summary.
execution_completedExecution finished, includes final results.
execution_pausedExecution was paused (can be resumed).
errorAn error occurred, includes recovery info.

Example event:

event: task_completed
data: {"event_type":"task_completed","task_id":"task-1","status":"completed","tokens_used":1250,"execution_time_ms":2800,"attempt":1,"result_preview":"The architecture follows a layered pattern..."}

event: progress_update
data: {"event_type":"progress_update","session_id":"arch-review","iteration":3,"tasks_completed":4,"tasks_total":6,"tasks_failed":0,"tokens_used":8500,"elapsed_secs":45}

Get Status

GET /api/knox-ms/autonomous/{session_id}/status

Response:

{
"session_id": "arch-review-session",
"status": "running",
"current_iteration": 3,
"tasks_completed": 4,
"tasks_failed": 0,
"total_tokens_used": 8500,
"elapsed_time_secs": 45,
"current_task": "document_patterns",
"goal_confidence": 0.72,
"checkpoints_created": 1
}

Cancel Execution

POST /api/knox-ms/autonomous/cancel

Request body:

{
"session_id": "arch-review-session",
"reason": "No longer needed"
}

Resume from Checkpoint

POST /api/knox-ms/autonomous/resume

Resume a cancelled or failed execution from the last checkpoint.

Request body:

{
"checkpoint_id": "cp-abc123",
"session_id": "arch-review-session",
"config_overrides": {
"max_iterations": 100
}
}

Knox.chat

Session & Memory

Knox-MS persists conversation context in sessions. Use sessions to maintain memory across multiple API calls.

All session endpoints require authentication and are under /api/knox-ms.

List Sessions

GET /api/knox-ms/sessions

Response:

{
"success": true,
"data": {
"sessions": [
{
"session_id": "my-project",
"created_at": 1740422400,
"last_accessed": 1740508800,
"total_messages": 42,
"total_tokens": 125000,
"has_active_plan": false
}
],
"total": 1
}
}

Create a Session

POST /api/knox-ms/sessions

Request body:

{
"session_id": "my-project",
"tags": ["coding", "rust"]
}

Both fields are optional. If session_id is omitted, one will be auto-generated.

Get a Session

GET /api/knox-ms/sessions/{session_id}

Returns session metadata including message count, token usage, and active plan status.

Delete a Session

DELETE /api/knox-ms/sessions/{session_id}

Permanently deletes the session and all associated memory.

Get Session History

GET /api/knox-ms/sessions/{session_id}/history

Returns the full conversation history stored in this session. Knox.chat

Knowledge Base

Knox-MS automatically extracts facts, concepts, and patterns from conversations into a knowledge base that can be searched and reused across sessions.

Search Knowledge

GET /api/knox-ms/knowledge

Query parameters:

ParameterTypeDefaultDescription
qstring""Search query.
categorystringFilter by category.
limitinteger20Maximum results to return.

Response:

{
"success": true,
"data": [
{
"id": "kn-abc123",
"category": "programming",
"title": "Rust Ownership Rules",
"content": "In Rust, each value has exactly one owner...",
"source_session": "my-project",
"created_at": 1740422400,
"keywords": ["rust", "ownership", "borrowing"]
}
]
}

Add Knowledge

POST /api/knox-ms/knowledge

Manually add an entry to your knowledge base.

Request body:

{
"category": "architecture",
"title": "Service Communication Pattern",
"content": "Services communicate via async message queues...",
"keywords": ["architecture", "async", "messaging"]
}

Knox.chat

User Preferences

Customize how Knox-MS behaves for your account.

Get Preferences

GET /api/knox-ms/user/preferences

Update Preferences

PUT /api/knox-ms/user/preferences

Only include the fields you want to change.

Request body:

{
"use_custom_models": true,
"easy_model": "openai/gpt-4o-mini",
"medium_model": "anthropic/claude-sonnet-4.6",
"hard_model": "anthropic/claude-opus-4.6",
"max_context_tokens": 200000,
"auto_summarize": true,
"default_verbosity": "verbose",
"max_output_tokens": -1
}
FieldTypeDescription
use_custom_modelsbooleanEnable custom model routing (overrides system defaults).
plan_modelstringModel for task planning.
easy_modelstringModel for simple tasks.
medium_modelstringModel for medium-complexity tasks.
hard_modelstringModel for complex tasks.
embedding_model_generalstringEmbedding model for general text.
embedding_model_codestringEmbedding model for code.
rerank_modelstringModel for search result reranking.
enable_rerankbooleanEnable reranking for vector search.
rerank_top_nintegerNumber of top results to rerank.
max_context_tokensintegerMaximum context window size.
auto_summarizebooleanAutomatically compress old context.
enable_knowledge_extractionbooleanAuto-extract knowledge entries from conversations.
summarize_thresholdintegerMessage count that triggers summarization.
max_tasks_per_planintegerMaximum tasks per execution plan.
enable_parallel_tasksbooleanExecute independent tasks in parallel.
default_verbositystring"minimal", "normal", or "verbose".
max_output_tokensintegerMaximum output tokens per response (-1 = unlimited).

Rate Limits

ScopeLimit
API endpoints (/v1/*)240 requests / 60 seconds
Management endpoints (/api/*)120 requests / 60 seconds

When you hit a rate limit, the API returns HTTP 429 Too Many Requests. Implement exponential backoff in your integration. Knox.chat

Errors

All responses follow a consistent format.

Success:

{
"success": true,
"data": { ... }
}

Error:

{
"success": false,
"message": "Descriptive error message"
}

OpenAI-compatible error (on relay endpoints):

{
"error": {
"message": "Descriptive error message",
"type": "invalid_request_error",
"code": "invalid_api_key"
}
}

HTTP Status Codes

CodeMeaning
200Success
201Created
400Bad request — invalid parameters or missing required fields
401Unauthorized — missing or invalid API key
403Forbidden — insufficient balance or permissions
404Not found — session, checkpoint, or resource doesn't exist
429Rate limit exceeded
500Internal server error
503Service unavailable — the requested service is not initialized

SDK Examples

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
base_url="https://api.knox.chat/v1",
api_key="sk-your-knox-api-key",
)

# Simple chat completion
response = client.chat.completions.create(
model="knox/knox-ms",
messages=[
{"role": "user", "content": "Explain how async/await works in Rust"}
],
)

print(response.choices[0].message.content)

With streaming:

stream = client.chat.completions.create(
model="knox/knox-ms",
messages=[
{"role": "user", "content": "Write a web scraper in Python"}
],
stream=True,
)

for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)

With Knox-MS session persistence:

import requests

response = requests.post(
"https://api.knox.chat/v1/chat/completions",
headers={"Authorization": "Bearer sk-your-knox-api-key"},
json={
"model": "knox/knox-ms",
"messages": [
{"role": "user", "content": "Let's design a REST API for a blog"}
],
"knox_ms": {
"session_id": "blog-api-design",
"memory_mode": "summarized",
"extract_knowledge": True,
},
},
)

print(response.json()["choices"][0]["message"]["content"])

# Later, continue the same conversation — Knox-MS remembers everything:
response = requests.post(
"https://api.knox.chat/v1/chat/completions",
headers={"Authorization": "Bearer sk-your-knox-api-key"},
json={
"model": "knox/knox-ms",
"messages": [
{"role": "user", "content": "Now add pagination to the list endpoints we discussed"}
],
"knox_ms": {
"session_id": "blog-api-design",
},
},
)

Node.js (OpenAI SDK)

import OpenAI from "openai";

const client = new OpenAI({
baseURL: "https://api.knox.chat/v1",
apiKey: "sk-your-knox-api-key",
});

const response = await client.chat.completions.create({
model: "knox/knox-ms",
messages: [
{ role: "user", content: "Build a React component for a data table" },
],
});

console.log(response.choices[0].message.content);

With streaming:

const stream = await client.chat.completions.create({
model: "knox/knox-ms",
messages: [
{ role: "user", content: "Build a React component for a data table" },
],
stream: true,
});

for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}

cURL

# Chat completion
curl https://api.knox.chat/v1/chat/completions \
-H "Authorization: Bearer sk-your-knox-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "knox/knox-ms",
"messages": [{"role": "user", "content": "Hello, Knox!"}],
"stream": false
}'

# List available models
curl https://api.knox.chat/v1/models \
-H "Authorization: Bearer sk-your-knox-api-key"

# Start autonomous execution
curl -X POST https://api.knox.chat/api/knox-ms/autonomous/execute \
-H "Authorization: Bearer sk-your-knox-api-key" \
-H "Content-Type: application/json" \
-d '{
"message": "Analyze and refactor this code for better performance",
"config": {"max_iterations": 30}
}'

Claude Code / Anthropic SDK

Knox works as a drop-in replacement for the Anthropic API:

# For Claude Code
export ANTHROPIC_BASE_URL="https://api.knox.chat"
export ANTHROPIC_API_KEY="sk-your-knox-api-key"

# That's it — Claude Code will now use Knox as its backend
# For the Anthropic Python SDK
import anthropic

client = anthropic.Anthropic(
base_url="https://api.knox.chat/v1",
api_key="sk-your-knox-api-key",
)

message = client.messages.create(
model="anthropic/claude-sonnet-4.6",
max_tokens=4096,
messages=[
{"role": "user", "content": "Explain monads in simple terms"}
],
)

print(message.content[0].text)

Knox.chat Need help? Visit the Knox.chat to manage your API keys, check your usage, and top up your balance.