Knox-MS 1.0 Developer Documentation
Base URL:
https://api.knox.chat
API Version: v1
Knox-MS is an AI orchestration engine with a human-brain-inspired memory architecture with hierarchical memory levels, autonomous execution, and intelligent context management that enables effectively unlimited context windows and persistent memory across sessions.
Table of Contents
- Quick Start
- Authentication
- Chat Completions
- Models
- Embeddings
- Image Generation
- Audio
- Moderations
- Reranking
- Anthropic Messages API (Claude Code)
- Autonomous Execution
- Session & Memory
- Knowledge Base
- User Preferences
- Rate Limits
- Errors
- SDK Examples
Quick Start
# 1. Get your API key from the Knox dashboard → Settings → API Keys
# 2. Make your first request
curl https://api.knox.chat/v1/chat/completions \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "knox/knox-ms",
"messages": [{"role": "user", "content": "Hello!"}]
}'
That's it. Knox-MS is fully OpenAI-compatible — any OpenAI SDK, library, or tool works out of the box. Knox.chat
Authentication
All API requests require a Bearer token. Create API keys from the Knox dashboard under Settings → API Keys.
Authorization: Bearer sk-xxxxxxxxxxxxxxxxxxxx
Pass the key in the Authorization header for every request.
Knox.chat
Chat Completions
Basic Request
POST /v1/chat/completions
Send a list of messages and receive a model-generated response.
Request body:
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID. Use knox/knox-ms for the Knox-MS engine, or any other available model (e.g., anthropic/claude-sonnet-4.6, openai/gpt-4o). |
messages | array | Yes | List of message objects with role and content. |
stream | boolean | No | If true, partial message deltas are sent as SSE events. Default false. |
max_tokens | integer | No | Maximum tokens to generate. |
temperature | number | No | Sampling temperature (0–2). Default varies by model. |
top_p | number | No | Nucleus sampling parameter (0–1). |
frequency_penalty | number | No | Penalizes repeated tokens (−2 to 2). |
presence_penalty | number | No | Penalizes tokens already present (−2 to 2). |
stop | string/array | No | Stop sequence(s). |
tools | array | No | List of tool/function definitions the model may call. |
tool_choice | string/object | No | Controls tool use: "auto", "none", "required", or a specific function. |
response_format | object | No | Force response format (e.g., {"type": "json_object"}). |
seed | integer | No | Seed for deterministic output. |
Example:
{
"model": "knox/knox-ms",
"messages": [
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to calculate fibonacci numbers"}
],
"temperature": 0.7,
"max_tokens": 2048
}
Response:
{
"id": "chatcmpl-abc123def456",
"object": "chat.completion",
"created": 1740422400,
"model": "knox/knox-ms",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Here's an efficient fibonacci function..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 28,
"completion_tokens": 150,
"total_tokens": 178
}
}
Streaming
Set "stream": true to receive the response as Server-Sent Events. Each event contains a data: line with a JSON chunk.
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1740422400,"model":"knox/knox-ms","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1740422400,"model":"knox/knox-ms","choices":[{"index":0,"delta":{"content":"Here's"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1740422400,"model":"knox/knox-ms","choices":[{"index":0,"delta":{"content":" an"},"finish_reason":null}]}
...
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1740422400,"model":"knox/knox-ms","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
Knox-MS Model
The knox/knox-ms model is a meta-model that automatically:
- Plans — Decomposes your request into a set of tasks
- Routes — Sends each task to the best model based on difficulty (fast models for simple tasks, powerful models for complex ones)
- Remembers — Persists conversation context to memory, so you can have conversations that span far beyond any single model's context window
- Learns — Records successful patterns and improves over time
It is accessed through the same /v1/chat/completions endpoint as any other model. There is nothing special you need to do — just set "model": "knox/knox-ms".
Knox-MS Parameters
When using knox/knox-ms, you can pass additional parameters in a knox_ms object at the top level of your request body:
{
"model": "knox/knox-ms",
"messages": [...],
"knox_ms": {
"session_id": "my-project-session",
"memory_mode": "summarized",
"verbosity": "verbose",
"include_reasoning": true,
"use_vector_search": true,
"extract_knowledge": true
}
}
| Parameter | Type | Default | Description |
|---|---|---|---|
session_id | string | auto-generated | Persistent session ID. Use the same ID across requests to maintain conversation context. |
memory_mode | string | "summarized" | How context is managed: "full" (keep everything), "summarized" (compress older context), "selective" (only relevant context). |
verbosity | string | "normal" | Response detail level: "minimal", "normal", "verbose". |
include_reasoning | boolean | false | Include planning/reasoning steps in the response. |
use_vector_search | boolean | false | Enable semantic vector search over past sessions for relevant context. |
vector_top_k | integer | 30 | Number of vector search candidates to retrieve. |
rerank_threshold | number | 0.5 | Minimum relevance score for vector search results (0.0–1.0). |
max_context_tokens | integer | — | Override the context window size for this request. |
force_model | string | — | Override auto-routing and use a specific model for all tasks. |
task_difficulty | string | — | Override auto-detection: "easy", "medium", "hard". |
extract_knowledge | boolean | false | Extract key facts and concepts into the knowledge base. |
final_only | boolean | false | Return only the final result, not intermediate task outputs. |
project_id | string | — | Project ID for scoped vector embeddings retrieval. |
temperature | number | — | Passed through to underlying models. |
top_p | number | — | Passed through to underlying models. |
tools | array | — | Tool/function definitions passed through to underlying models. |
tool_choice | string/object | — | Tool choice strategy passed through to underlying models. |
Knox-MS Metadata in Responses
When using knox/knox-ms, responses include an additional knox_ms_meta field:
{
"id": "chatcmpl-abc123",
"model": "knox/knox-ms",
"choices": [...],
"usage": {...},
"knox_ms_meta": {
"session_id": "my-project-session",
"plan_id": "plan-xyz",
"plan_description": "Implement fibonacci function with memoization",
"current_task": "write_function",
"task_status": "completed",
"tasks_completed": 1,
"tasks_total": 1,
"tasks_failed": 0,
"total_model_calls": 2,
"models_used": {"anthropic/claude-sonnet-4.6": 1, "anthropic/claude-haiku-4.5": 1},
"memory_mode": "summarized",
"memory_tokens_saved": 5000,
"context_tokens_used": 1200,
"execution_time_ms": 3200,
"vector_search_results": 0,
"summary_updated": false
}
}
| Field | Description |
|---|---|
session_id | The session used for this request |
plan_id | ID of the generated execution plan |
tasks_completed / tasks_total / tasks_failed | Task execution summary |
models_used | Map of model → number of calls |
memory_tokens_saved | Tokens saved via summarization/compression |
context_tokens_used | Tokens used for context in this request |
execution_time_ms | Total processing time |
vector_search_results | Number of relevant past context chunks retrieved |
Models
List Models
GET /v1/models
Returns a list of all models currently available to your account.
Response:
{
"object": "list",
"data": [
{
"id": "knox/knox-ms",
"object": "model",
"created": 1740422400,
"owned_by": "knox"
},
{
"id": "anthropic/claude-sonnet-4.6",
"object": "model",
"created": 1740422400,
"owned_by": "anthropic"
}
]
}
Retrieve Model
GET /v1/models/{model_id}
Returns details about a specific model. Knox.chat
Embeddings
POST /v1/embeddings
Generate vector embeddings for text input.
Request body:
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Embedding model ID (e.g., voyage-4-lite, text-embedding-3-small). |
input | string/array | Yes | Text(s) to embed. |
Example:
{
"model": "voyage-4-lite",
"input": "Knox-MS is an AI orchestration engine."
}
Response:
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0123, -0.0456, ...]
}
],
"model": "voyage-4-lite",
"usage": {
"prompt_tokens": 8,
"total_tokens": 8
}
}
Knox.chat
Image Generation
POST /v1/images/generations
Generate images from a text prompt.
Request body:
| Field | Type | Required | Description |
|---|---|---|---|
model | string | No | Image model ID. |
prompt | string | Yes | Text description of the image. |
n | integer | No | Number of images to generate. |
size | string | No | Image size (e.g., 1024x1024). |
Audio
Transcription
POST /v1/audio/transcriptions
Transcribe audio to text. Accepts multipart/form-data.
Translation
POST /v1/audio/translations
Translate audio to English text.
Text-to-Speech
POST /v1/audio/speech
Generate audio from text input.
Request body:
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | TTS model ID. |
input | string | Yes | Text to convert to speech. |
voice | string | Yes | Voice ID. |
Moderations
POST /v1/moderations
Classify text for content policy violations. Knox.chat
Reranking
POST /v1/rerank
Rerank a list of documents by relevance to a query (VoyageAI-compatible).
Request body:
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Rerank model ID (e.g., rerank-2.5). |
query | string | Yes | The query to rank documents against. |
documents | array | Yes | List of document strings to rerank. |
top_n | integer | No | Number of top results to return. |
Anthropic Messages API
Knox provides full compatibility with the Anthropic Messages API, so tools like Claude Code and the Anthropic Python/JS SDK work natively.
POST /v1/messages
Setup for Claude Code:
export ANTHROPIC_BASE_URL="https://api.knox.chat"
export ANTHROPIC_API_KEY="sk-your-knox-api-key"
Setup for the Anthropic Python SDK:
import anthropic
client = anthropic.Anthropic(
base_url="https://api.knox.chat/v1",
api_key="sk-your-knox-api-key",
)
message = client.messages.create(
model="anthropic/claude-sonnet-4.6",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude!"}
],
)
print(message.content[0].text)
The endpoint accepts the full Anthropic request format (separate system field, max_tokens required, content blocks, tool use, extended thinking) and returns Anthropic-formatted responses.
Knox.chat
Autonomous Execution
For complex, multi-step tasks, Knox-MS can run an autonomous execution loop that iteratively plans, executes, evaluates, and refines until your goal is achieved — with real-time progress streaming.
All autonomous endpoints require authentication and are under
/api/knox-ms/autonomous.
Start Execution
POST /api/knox-ms/autonomous/execute
Request body:
| Field | Type | Required | Description |
|---|---|---|---|
message | string | Yes | The goal or task to accomplish. |
session_id | string | No | Session ID for context persistence. Auto-generated if omitted. |
stream_events | boolean | No | Enable real-time SSE event streaming. Default true. |
config.max_iterations | integer | No | Maximum execution iterations (safety limit). |
config.max_time_secs | integer | No | Maximum execution time in seconds. |
config.confidence_threshold | number | No | Confidence level required to consider the goal complete (0–1). |
config.enable_checkpointing | boolean | No | Enable periodic state checkpoints for recovery. |
config.checkpoint_interval | integer | No | Checkpoint every N iterations. |
config.enable_smart_tasks | boolean | No | Enable smart task decomposition. |
config.enable_adaptive_retry | boolean | No | Enable automatic retry with adapted strategy on failure. |
Example:
{
"message": "Analyze this codebase and produce a comprehensive architecture document",
"session_id": "arch-review-session",
"config": {
"max_iterations": 50,
"max_time_secs": 1800,
"confidence_threshold": 0.85,
"enable_checkpointing": true
}
}
Response:
{
"execution_id": "exec-a1b2c3d4e5f6",
"session_id": "arch-review-session",
"status": "running",
"result": null,
"error": null,
"events_url": "/api/knox-ms/autonomous/arch-review-session/events"
}
Stream Events (SSE)
GET /api/knox-ms/autonomous/{session_id}/events
Returns a Server-Sent Events stream with real-time progress updates. Connect to this URL to receive events as the execution progresses.
Event types:
| Event | Description |
|---|---|
execution_started | Execution has begun, includes plan overview. |
plan_updated | A new or revised plan was generated. |
task_started | A task began execution, includes model and difficulty. |
task_progress | Progress update for a long-running task. |
task_content_chunk | Streaming content from a task (incremental output). |
task_completed | A task finished, includes token usage and timing. |
task_failed | A task failed, includes error and retry info. |
task_evaluated | Quality evaluation of a completed task. |
memory_operation | A memory operation was performed (summarize, archive, etc.). |
context_updated | Session context was updated or compressed. |
knowledge_extracted | Knowledge entries were extracted from results. |
checkpoint_created | Execution state was saved. |
progress_update | Overall progress summary. |
execution_completed | Execution finished, includes final results. |
execution_paused | Execution was paused (can be resumed). |
error | An error occurred, includes recovery info. |
Example event:
event: task_completed
data: {"event_type":"task_completed","task_id":"task-1","status":"completed","tokens_used":1250,"execution_time_ms":2800,"attempt":1,"result_preview":"The architecture follows a layered pattern..."}
event: progress_update
data: {"event_type":"progress_update","session_id":"arch-review","iteration":3,"tasks_completed":4,"tasks_total":6,"tasks_failed":0,"tokens_used":8500,"elapsed_secs":45}
Get Status
GET /api/knox-ms/autonomous/{session_id}/status
Response:
{
"session_id": "arch-review-session",
"status": "running",
"current_iteration": 3,
"tasks_completed": 4,
"tasks_failed": 0,
"total_tokens_used": 8500,
"elapsed_time_secs": 45,
"current_task": "document_patterns",
"goal_confidence": 0.72,
"checkpoints_created": 1
}
Cancel Execution
POST /api/knox-ms/autonomous/cancel
Request body:
{
"session_id": "arch-review-session",
"reason": "No longer needed"
}
Resume from Checkpoint
POST /api/knox-ms/autonomous/resume
Resume a cancelled or failed execution from the last checkpoint.
Request body:
{
"checkpoint_id": "cp-abc123",
"session_id": "arch-review-session",
"config_overrides": {
"max_iterations": 100
}
}
Knox.chat
Session & Memory
Knox-MS persists conversation context in sessions. Use sessions to maintain memory across multiple API calls.
All session endpoints require authentication and are under
/api/knox-ms.
List Sessions
GET /api/knox-ms/sessions
Response:
{
"success": true,
"data": {
"sessions": [
{
"session_id": "my-project",
"created_at": 1740422400,
"last_accessed": 1740508800,
"total_messages": 42,
"total_tokens": 125000,
"has_active_plan": false
}
],
"total": 1
}
}
Create a Session
POST /api/knox-ms/sessions
Request body:
{
"session_id": "my-project",
"tags": ["coding", "rust"]
}
Both fields are optional. If session_id is omitted, one will be auto-generated.
Get a Session
GET /api/knox-ms/sessions/{session_id}
Returns session metadata including message count, token usage, and active plan status.
Delete a Session
DELETE /api/knox-ms/sessions/{session_id}
Permanently deletes the session and all associated memory.
Get Session History
GET /api/knox-ms/sessions/{session_id}/history
Returns the full conversation history stored in this session. Knox.chat
Knowledge Base
Knox-MS automatically extracts facts, concepts, and patterns from conversations into a knowledge base that can be searched and reused across sessions.
Search Knowledge
GET /api/knox-ms/knowledge
Query parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
q | string | "" | Search query. |
category | string | — | Filter by category. |
limit | integer | 20 | Maximum results to return. |
Response:
{
"success": true,
"data": [
{
"id": "kn-abc123",
"category": "programming",
"title": "Rust Ownership Rules",
"content": "In Rust, each value has exactly one owner...",
"source_session": "my-project",
"created_at": 1740422400,
"keywords": ["rust", "ownership", "borrowing"]
}
]
}
Add Knowledge
POST /api/knox-ms/knowledge
Manually add an entry to your knowledge base.
Request body:
{
"category": "architecture",
"title": "Service Communication Pattern",
"content": "Services communicate via async message queues...",
"keywords": ["architecture", "async", "messaging"]
}
Knox.chat
User Preferences
Customize how Knox-MS behaves for your account.
Get Preferences
GET /api/knox-ms/user/preferences
Update Preferences
PUT /api/knox-ms/user/preferences
Only include the fields you want to change.
Request body:
{
"use_custom_models": true,
"easy_model": "openai/gpt-4o-mini",
"medium_model": "anthropic/claude-sonnet-4.6",
"hard_model": "anthropic/claude-opus-4.6",
"max_context_tokens": 200000,
"auto_summarize": true,
"default_verbosity": "verbose",
"max_output_tokens": -1
}
| Field | Type | Description |
|---|---|---|
use_custom_models | boolean | Enable custom model routing (overrides system defaults). |
plan_model | string | Model for task planning. |
easy_model | string | Model for simple tasks. |
medium_model | string | Model for medium-complexity tasks. |
hard_model | string | Model for complex tasks. |
embedding_model_general | string | Embedding model for general text. |
embedding_model_code | string | Embedding model for code. |
rerank_model | string | Model for search result reranking. |
enable_rerank | boolean | Enable reranking for vector search. |
rerank_top_n | integer | Number of top results to rerank. |
max_context_tokens | integer | Maximum context window size. |
auto_summarize | boolean | Automatically compress old context. |
enable_knowledge_extraction | boolean | Auto-extract knowledge entries from conversations. |
summarize_threshold | integer | Message count that triggers summarization. |
max_tasks_per_plan | integer | Maximum tasks per execution plan. |
enable_parallel_tasks | boolean | Execute independent tasks in parallel. |
default_verbosity | string | "minimal", "normal", or "verbose". |
max_output_tokens | integer | Maximum output tokens per response (-1 = unlimited). |
Rate Limits
| Scope | Limit |
|---|---|
API endpoints (/v1/*) | 240 requests / 60 seconds |
Management endpoints (/api/*) | 120 requests / 60 seconds |
When you hit a rate limit, the API returns HTTP 429 Too Many Requests. Implement exponential backoff in your integration.
Knox.chat
Errors
All responses follow a consistent format.
Success:
{
"success": true,
"data": { ... }
}
Error:
{
"success": false,
"message": "Descriptive error message"
}
OpenAI-compatible error (on relay endpoints):
{
"error": {
"message": "Descriptive error message",
"type": "invalid_request_error",
"code": "invalid_api_key"
}
}
HTTP Status Codes
| Code | Meaning |
|---|---|
200 | Success |
201 | Created |
400 | Bad request — invalid parameters or missing required fields |
401 | Unauthorized — missing or invalid API key |
403 | Forbidden — insufficient balance or permissions |
404 | Not found — session, checkpoint, or resource doesn't exist |
429 | Rate limit exceeded |
500 | Internal server error |
503 | Service unavailable — the requested service is not initialized |
SDK Examples
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="https://api.knox.chat/v1",
api_key="sk-your-knox-api-key",
)
# Simple chat completion
response = client.chat.completions.create(
model="knox/knox-ms",
messages=[
{"role": "user", "content": "Explain how async/await works in Rust"}
],
)
print(response.choices[0].message.content)
With streaming:
stream = client.chat.completions.create(
model="knox/knox-ms",
messages=[
{"role": "user", "content": "Write a web scraper in Python"}
],
stream=True,
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
With Knox-MS session persistence:
import requests
response = requests.post(
"https://api.knox.chat/v1/chat/completions",
headers={"Authorization": "Bearer sk-your-knox-api-key"},
json={
"model": "knox/knox-ms",
"messages": [
{"role": "user", "content": "Let's design a REST API for a blog"}
],
"knox_ms": {
"session_id": "blog-api-design",
"memory_mode": "summarized",
"extract_knowledge": True,
},
},
)
print(response.json()["choices"][0]["message"]["content"])
# Later, continue the same conversation — Knox-MS remembers everything:
response = requests.post(
"https://api.knox.chat/v1/chat/completions",
headers={"Authorization": "Bearer sk-your-knox-api-key"},
json={
"model": "knox/knox-ms",
"messages": [
{"role": "user", "content": "Now add pagination to the list endpoints we discussed"}
],
"knox_ms": {
"session_id": "blog-api-design",
},
},
)
Node.js (OpenAI SDK)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.knox.chat/v1",
apiKey: "sk-your-knox-api-key",
});
const response = await client.chat.completions.create({
model: "knox/knox-ms",
messages: [
{ role: "user", content: "Build a React component for a data table" },
],
});
console.log(response.choices[0].message.content);
With streaming:
const stream = await client.chat.completions.create({
model: "knox/knox-ms",
messages: [
{ role: "user", content: "Build a React component for a data table" },
],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}
cURL
# Chat completion
curl https://api.knox.chat/v1/chat/completions \
-H "Authorization: Bearer sk-your-knox-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "knox/knox-ms",
"messages": [{"role": "user", "content": "Hello, Knox!"}],
"stream": false
}'
# List available models
curl https://api.knox.chat/v1/models \
-H "Authorization: Bearer sk-your-knox-api-key"
# Start autonomous execution
curl -X POST https://api.knox.chat/api/knox-ms/autonomous/execute \
-H "Authorization: Bearer sk-your-knox-api-key" \
-H "Content-Type: application/json" \
-d '{
"message": "Analyze and refactor this code for better performance",
"config": {"max_iterations": 30}
}'
Claude Code / Anthropic SDK
Knox works as a drop-in replacement for the Anthropic API:
# For Claude Code
export ANTHROPIC_BASE_URL="https://api.knox.chat"
export ANTHROPIC_API_KEY="sk-your-knox-api-key"
# That's it — Claude Code will now use Knox as its backend
# For the Anthropic Python SDK
import anthropic
client = anthropic.Anthropic(
base_url="https://api.knox.chat/v1",
api_key="sk-your-knox-api-key",
)
message = client.messages.create(
model="anthropic/claude-sonnet-4.6",
max_tokens=4096,
messages=[
{"role": "user", "content": "Explain monads in simple terms"}
],
)
print(message.content[0].text)
Knox.chat Need help? Visit the Knox.chat to manage your API keys, check your usage, and top up your balance.