Messages

POST https://api.knox.chat/v1/messages

Create a message using the Anthropic Messages API format. This endpoint provides full compatibility with Claude Code and Anthropic SDK clients.

Request

This endpoint requires an object containing the following properties:

Headers

Name	Type	Required	Description
Authorization	String	Yes	Bearer authentication in the form `Bearer token`, where `token` is your authorization token. Alternatively, use `x-api-key` header.
x-api-key	String	No	Alternative authentication using Anthropic-style API key header.
anthropic-version	String	No	Anthropic API version (e.g., `2023-06-01`). Optional but recommended.

Request Body

Name	Type	Required	Description
model	String	Yes	The model to use. Examples: `anthropic/claude-sonnet-4.6`, `anthropic/claude-opus-4.6`, `anthropic/claude-haiku-4.5`, or aliases like `sonnet`, `opus`, `haiku`.
messages	Array	Yes	Array of message objects representing the conversation.
max_tokens	Integer	Yes	Maximum number of tokens to generate (required).
system	String or Array	No	System prompt. Can be a string or array of content blocks with cache control.
metadata	Object	No	Request metadata containing optional `user_id`.
stop_sequences	Array of Strings	No	Custom stop sequences that will cause the model to stop generating.
stream	Boolean	No	Enable streaming responses using SSE. Defaults to `false`.
temperature	Double	No	Sampling temperature (range: [0.0, 1.0]).
top_k	Integer	No	Top-k sampling value.
top_p	Double	No	Top-p (nucleus) sampling value (range: (0, 1]).
web_search	Boolean	No	Enable or disable automatic web search. Defaults to `true`. Set to `false` to disable web search. When enabled, the model can search the web for up-to-date information. Billed at $10 per 1,000 search calls.
tools	Array	No	Array of tool definitions for function calling.
tool_choice	Object	No	How the model should use tools: `{"type": "auto"}`, `{"type": "any"}`, or `{"type": "tool", "name": "..."}`.
thinking	Object	No	Extended thinking configuration for Claude 3.5+. Use `{"type": "enabled", "budget_tokens": 1024}`.

Message Object

Name	Type	Required	Description
role	String	Yes	The role of the message author: `user` or `assistant`.
content	String or Array	Yes	The content of the message. Can be a string or array of content blocks.

Content Block Types

Text Block

{
  "type": "text",
  "text": "Your text content here",
  "cache_control": {"type": "ephemeral"}
}

Image Block

{
  "type": "image",
  "source": {
    "type": "base64",
    "media_type": "image/png",
    "data": "base64-encoded-image-data"
  }
}

Tool Use Block (in assistant messages)

{
  "type": "tool_use",
  "id": "tool_call_id",
  "name": "function_name",
  "input": {"param": "value"}
}

Tool Result Block (in user messages)

{
  "type": "tool_result",
  "tool_use_id": "tool_call_id",
  "content": "Result of the tool call",
  "is_error": false
}

Tool Definition

Name	Type	Required	Description
name	String	Yes	The name of the tool/function.
description	String	No	Description of what the tool does.
input_schema	Object	Yes	JSON Schema object defining the tool's parameters.

Cache Control (Prompt Caching)

Knox supports Anthropic's prompt caching feature to reduce costs on repeated prompts. Cache control can be applied to both text and image content blocks.

Name	Type	Required	Description
type	String	Yes	Cache type. Use `"ephemeral"`.
ttl	String	No	Time-to-live. Default is 5 minutes. Use `"1h"` for 1-hour TTL.

Supported Cache Locations

Cache control can be added to:

System prompt (string or content blocks)
User message text blocks
User message image blocks

Cache Breakpoints

Add cache_control to mark cache breakpoints in your prompt. Content before the breakpoint will be cached and reused in subsequent requests.

{
  "system": [
    {
      "type": "text",
      "text": "Long reference documentation that should be cached...",
      "cache_control": {"type": "ephemeral"}
    }
  ],
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Large context to cache...",
          "cache_control": {"type": "ephemeral"}
        },
        {
          "type": "text",
          "text": "Your actual question (not cached)"
        }
      ]
    }
  ]
}

Image Caching

Images can also be cached, which is useful when asking multiple questions about the same image:

{
  "role": "user",
  "content": [
    {
      "type": "image",
      "source": {
        "type": "base64",
        "media_type": "image/png",
        "data": "base64-encoded-image-data"
      },
      "cache_control": {"type": "ephemeral"}
    },
    {
      "type": "text",
      "text": "What's in this image?"
    }
  ]
}

Cache Pricing

Token Type	Cost Multiplier
cache_creation_input_tokens	1.25x (25% more than regular input)
cache_read_input_tokens	0.1x (90% discount from regular input)

Cache TTL Options

TTL	Duration	Use Case
`{"type": "ephemeral"}`	5 minutes	Short conversations, quick follow-ups
`{"type": "ephemeral", "ttl": "1h"}`	1 hour	Longer sessions, document analysis

Best Practices

Place cache breakpoints strategically: Cache large, static content like documentation, code files, or reference materials.
Order content by stability: Put the most stable content first (system prompt), then cached user content, then dynamic queries.
Minimum token threshold: Caching is most effective for prompts with at least 1,024 tokens of cacheable content.
Reuse within TTL: Make follow-up requests within the TTL window to benefit from cache reads.

cURL Example

Basic Request

curl -X POST https://api.knox.chat/v1/messages \
     -H "Authorization: Bearer <token>" \
     -H "Content-Type: application/json" \
     -H "anthropic-version: 2023-06-01" \
     -d '{
  "model": "anthropic/claude-sonnet-4.6",
  "max_tokens": 1024,
  "messages": [
    {"role": "user", "content": "Hello, Claude!"}
  ]
}'

With System Prompt

curl -X POST https://api.knox.chat/v1/messages \
     -H "Authorization: Bearer <token>" \
     -H "Content-Type: application/json" \
     -d '{
  "model": "anthropic/claude-sonnet-4.6",
  "max_tokens": 1024,
  "system": "You are a helpful coding assistant.",
  "messages": [
    {"role": "user", "content": "Write a Python hello world program"}
  ]
}'

With Prompt Caching

curl -X POST https://api.knox.chat/v1/messages \
     -H "Authorization: Bearer <token>" \
     -H "Content-Type: application/json" \
     -d '{
  "model": "anthropic/claude-sonnet-4.6",
  "max_tokens": 1024,
  "system": [
    {
      "type": "text",
      "text": "Very long system prompt or documentation to cache...",
      "cache_control": {"type": "ephemeral"}
    }
  ],
  "messages": [
    {"role": "user", "content": "Question about the cached content"}
  ]
}'

With Tool Use

curl -X POST https://api.knox.chat/v1/messages \
     -H "Authorization: Bearer <token>" \
     -H "Content-Type: application/json" \
     -d '{
  "model": "anthropic/claude-sonnet-4.6",
  "max_tokens": 1024,
  "tools": [
    {
      "name": "get_weather",
      "description": "Get the current weather for a location",
      "input_schema": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "City and state, e.g. San Francisco, CA"
          }
        },
        "required": ["location"]
      }
    }
  ],
  "messages": [
    {"role": "user", "content": "What is the weather in Tokyo?"}
  ]
}'

With Web Search Disabled

curl -X POST https://api.knox.chat/v1/messages \
     -H "Authorization: Bearer <token>" \
     -H "Content-Type: application/json" \
     -d '{
  "model": "anthropic/claude-sonnet-4.6",
  "max_tokens": 1024,
  "web_search": false,
  "messages": [
    {"role": "user", "content": "What is the latest news?"}
  ]
}'

Note: Web search is enabled by default for all Anthropic models. The model will automatically search the web when the query requires real-time information. Web search is billed at $10 per 1,000 search calls in addition to standard token costs.

Streaming Request

curl -X POST https://api.knox.chat/v1/messages \
     -H "Authorization: Bearer <token>" \
     -H "Content-Type: application/json" \
     -d '{
  "model": "anthropic/claude-sonnet-4.6",
  "max_tokens": 1024,
  "stream": true,
  "messages": [
    {"role": "user", "content": "Tell me a short story"}
  ]
}'

Response

Success Response (200)

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello! I'm Claude, an AI assistant. How can I help you today?"
    }
  ],
  "model": "anthropic/claude-sonnet-4.6",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 25,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0
  }
}

Response with Tool Use

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "tool_use",
      "id": "toolu_01A09q90qw90lq917835lhl",
      "name": "get_weather",
      "input": {"location": "Tokyo, Japan"}
    }
  ],
  "model": "anthropic/claude-sonnet-4.6",
  "stop_reason": "tool_use",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 50,
    "output_tokens": 35
  }
}

Streaming Response

When stream: true, the response is sent as Server-Sent Events (SSE):

event: message_start
data: {"type":"message_start","message":{"id":"msg_01...","type":"message","role":"assistant","content":[],"model":"anthropic/claude-sonnet-4.6","stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":12,"output_tokens":0}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"!"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"output_tokens":25}}

event: message_stop
data: {"type":"message_stop"}

Response Schema

Name	Type	Description
id	String	Unique identifier for the message.
type	String	Always `"message"`.
role	String	Always `"assistant"`.
content	Array	Array of content blocks (text, tool_use, or thinking).
model	String	The model that generated the response.
stop_reason	String	Reason for stopping: `"end_turn"`, `"max_tokens"`, `"stop_sequence"`, or `"tool_use"`.
stop_sequence	String or null	The stop sequence that caused the model to stop, if applicable.
usage	Object	Token usage information.

Usage Object

Name	Type	Description
input_tokens	Integer	Number of input tokens processed.
output_tokens	Integer	Number of output tokens generated.
cache_creation_input_tokens	Integer	Tokens written to cache (1.25x cost). Only present when using prompt caching.
cache_read_input_tokens	Integer	Tokens read from cache (0.1x cost). Only present when using prompt caching.

Error Response

{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "message": "Invalid request format: missing required field 'messages'"
  }
}

Error Types

Type	Description
invalid_request_error	The request was malformed or missing required fields.
authentication_error	Invalid or missing API key.
permission_error	The API key doesn't have access to the requested model.
rate_limit_error	Too many requests. Please slow down.
api_error	An internal server error occurred.

Model Aliases

Knox automatically resolves model aliases for convenience:

Alias	Resolved Model
`haiku`	`anthropic/claude-haiku-4.5`
`sonnet`	`anthropic/claude-sonnet-4.6`
`opus`	`anthropic/claude-opus-4.6`
`claude-3-5-sonnet-*`	`anthropic/claude-sonnet-4.6`
`claude-3-5-haiku-*`	`anthropic/claude-haiku-4.5`
`claude-3-5-opus-*`	`anthropic/claude-opus-4.6`

SDK Usage

Python (Anthropic SDK)

import anthropic

client = anthropic.Anthropic(
    base_url="https://api.knox.chat",
    api_key="sk-your-knox-api-key",
)

message = client.messages.create(
    model="anthropic/claude-sonnet-4.6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude!"}
    ]
)

print(message.content[0].text)

JavaScript (Anthropic SDK)

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  baseURL: 'https://api.knox.chat',
  apiKey: 'sk-your-knox-api-key',
});

const message = await client.messages.create({
  model: 'anthropic/claude-sonnet-4.6',
  max_tokens: 1024,
  messages: [
    { role: 'user', content: 'Hello, Claude!' }
  ],
});

console.log(message.content[0].text);

Claude Code Configuration

export ANTHROPIC_BASE_URL="https://api.knox.chat"
export ANTHROPIC_AUTH_TOKEN="sk-your-knox-api-key"
export ANTHROPIC_API_KEY=""  # Must be explicitly empty

Then run claude in your terminal to start Claude Code with Knox.

Messages

https://api.knox.chat/v1/messages

Request​

Headers​

Request Body​

Message Object​

Content Block Types​

Text Block​

Image Block​

Tool Use Block (in assistant messages)​

Tool Result Block (in user messages)​

Tool Definition​

Cache Control (Prompt Caching)​

Supported Cache Locations​

Cache Breakpoints​

Image Caching​

Cache Pricing​

Cache TTL Options​

Best Practices​

cURL Example​

Basic Request​

With System Prompt​

With Prompt Caching​

With Tool Use​

With Web Search Disabled​

Streaming Request​

Response​

Success Response (200)​

Response with Tool Use​

Streaming Response​

Response Schema​

Usage Object​

Error Response​

Error Types​

Model Aliases​

SDK Usage​

Python (Anthropic SDK)​

JavaScript (Anthropic SDK)​

Claude Code Configuration​