Quickstart

Knox Chat: provides a unified API that allows you to access hundreds of AI models through a single endpoint, while automatically handling fallbacks and selecting the most cost-effective options. Our goal is not merely to provide a single API for accessing multiple models, but also to focus on multimodality and enable convenient usage of today's popular open-source AI or agent applications and tools with just one key.

Using the OpenAI SDK

TypeScript
Python

import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'https://knox.chat/v1',
  apiKey: '<KNOXCHAT_API_KEY>',
});

async function main() {
  const completion = await openai.chat.completions.create({
    model: 'openai/gpt-5',
    messages: [
      {
        role: 'user',
        content: 'What is the meaning of life?',
      },
    ],
  });

  console.log(completion.choices[0].message);
}

main();

from openai import OpenAI

client = OpenAI(
  base_url="https://knox.chat/v1",
  api_key="<KNOXCHAT_API_KEY>",
)

completion = client.chat.completions.create(
  model="openai/gpt-5",
  messages=[
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ]
)

print(completion.choices[0].message.content)

Using the Knox.Chat API directly

Python
TypeScript
Shell

import requests
import json

response = requests.post(
  url="https://knox.chat/v1/chat/completions",
  headers={
    "Authorization": "Bearer <KNOXCHAT_API_KEY>",
  },
  data=json.dumps({
    "model": "anthropic/claude-sonnet-4.5", # Optional
    "messages": [
      {
        "role": "user",
        "content": "What is the meaning of life?"
      }
    ]
  })
)

fetch('https://knox.chat/v1/chat/completions', {
  method: 'POST',
  headers: {
    Authorization: 'Bearer <KNOXCHAT_API_KEY>',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'anthropic/claude-sonnet-4.5',
    messages: [
      {
        role: 'user',
        content: 'What is the meaning of life?',
      },
    ],
  }),
});

curl https://knox.chat/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $KNOXCHAT_API_KEY" \
  -d '{
  "model": "anthropic/claude-sonnet-4.5",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ]
}'

The API also supports streaming.

Using third-party SDKs

For information about using third-party SDKs and frameworks with Knox Chat, please see our frameworks documentation.

Principles

Knox Chat helps developers source and optimize AI usage. We believe the future is multimodality and multi-provider.

Why Knox Chat?

Price and Performance. Knox Chat scouts for the best prices, the lowest latencies, and the highest throughput across dozens of providers, and lets you choose how to prioritize them.

Standardized API. No need to change code when switching between models or providers. You can even let your users choose and pay for their own.

Real-World Insights. Be the first to take advantage of new models. See real-world data of how often models are used for different purposes. Keep up to date in our Discord channel.

Consolidated Billing. Simple and transparent billing, regardless of how many providers you use.

Higher Availability. Fallback providers, and automatic, smart routing means your requests still work even when providers go down.

Higher Rate Limits. Knox Chat works directly with providers to provide better rate limits and more throughput.

Models

One API for hundreds of models

Explore and browse 300+ models and providers on our website, or with our API.

Models API Standard

Our Models API makes the most important information about all LLMs freely available as soon as we confirm it.

API Response Schema

The Models API returns a standardized JSON response format that provides comprehensive metadata for each available model. This schema is cached at the edge and designed for reliable integration for production applications.

Root Response Object

{
  "data": [
    /* Array of Model objects */
  ]
}

Model Object Schema

Each model in the data array contains the following standardized fields:

Field	Type	Description
`id`	`string`	Unique model identifier used in API requests (e.g., `"google/gemini-2.5-pro"`)
`object`	`string`	Object type identifier (always `"model"`)
`created`	`number`	Unix timestamp of when the model was created
`owned_by`	`string`	Organization that owns the model
`permission`	`ModelPermission[]`	Array of permission objects defining access controls
`root`	`string`	Root model identifier
`parent`	`string \| null`	Parent model identifier if this is a fine-tuned version
`context_length`	`number`	Maximum context window size in tokens
`architecture`	`Architecture`	Object describing the model's technical capabilities
`pricing`	`Pricing`	Price structure for using this model
`top_provider`	`TopProvider`	Configuration details for the primary provider
`supported_parameters`	`string[]`	Array of supported API parameters for this model

Architecture Object

{
  "modality": string, // High-level description of input/output flow (e.g., "text+image->text")
  "input_modalities": string[], // Supported input types: ["file", "image", "text", "audio"]
  "output_modalities": string[], // Supported output types: ["text"]
  "tokenizer": string // Tokenization method used (e.g., "Gemini")
}

Pricing Object

All pricing values are in USD per token/request/unit. A value of "0" indicates the feature is free.

{
  "prompt": string, // Cost per input token
  "completion": string, // Cost per output token
  "request": string, // Fixed cost per API request
  "image": string, // Cost per image input
  "audio": string, // Cost per audio input
  "web_search": string, // Cost per web search operation
  "internal_reasoning": string, // Cost for internal reasoning tokens
  "input_cache_read": string, // Cost per cached input token read
  "input_cache_write": string // Cost per cached input token write
}

Top Provider Object

{
  "context_length": number, // Provider-specific context limit
  "max_completion_tokens": number // Maximum tokens in response
}

Supported Parameters

The supported_parameters array indicates which OpenAI-compatible parameters work with each model:

include_reasoning - Include reasoning in response
max_tokens - Response length limiting
reasoning - Internal reasoning mode
response_format - Output format specification
seed - Deterministic outputs
stop - Custom stop sequences
structured_outputs - JSON schema enforcement
temperature - Randomness control
tool_choice - Tool selection control
tools - Function calling capabilities
top_p - Nucleus sampling

Token Counting Differences

Different models use different tokenization methods (as indicated by the tokenizer field in the model schema). Some models break up text into chunks of multiple characters (GPT, Claude, Llama, etc), while others tokenize differently (like Gemini). This means that token counts (and therefore costs) will vary between models, even when inputs and outputs are the same. Costs are displayed and billed according to the tokenizer for the model in use. You can use the usage field in API responses to get the actual token counts for your input and output.

Quickstart

Using the OpenAI SDK

Using the Knox.Chat API directly

Using third-party SDKs

Principles

Why Knox Chat?

Models

Models API Standard

API Response Schema

Root Response Object

Model Object Schema

Architecture Object

Pricing Object

Top Provider Object

Supported Parameters

Frequently Asked Questions

Getting started

Models and Providers

API Technical Specifications

Privacy and Data Logging

Using the OpenAI SDK​

Using the Knox.Chat API directly​

Using third-party SDKs​

Principles​

Why Knox Chat?​

Models​

Models API Standard​

API Response Schema​

Root Response Object​

Model Object Schema​

Architecture Object​

Pricing Object​

Top Provider Object​

Supported Parameters​

Frequently Asked Questions​

Getting started​

Why should I use Knox Chat?

How do I get started with Knox Chat?

Where can I get technical support?

How do I pay for using Knox Chat?

How do I recharge?

Models and Providers​

What LLM models does Knox Chat support?

How frequently are new models added

What are model variants?

What is the expected latency/response time for different models?

How does model fallback work if a provider is unavailable?

API Technical Specifications​

What authentication methods are supported?

How are rate limits calculated?

What API endpoints are available?

What are the supported formats?

How does streaming work?

What SDK support is available?

Privacy and Data Logging​

What data is logged during API use?

What third-party sharing occurs?

What data is logged during Chat page use?

Using the OpenAI SDK

Using the Knox.Chat API directly

Using third-party SDKs

Principles

Why Knox Chat?

Models

Models API Standard

API Response Schema

Root Response Object

Model Object Schema

Architecture Object

Pricing Object

Top Provider Object

Supported Parameters

Frequently Asked Questions

Getting started

Models and Providers

API Technical Specifications

Privacy and Data Logging