Quickstart
Knox Chat: provides a unified API that allows you to access hundreds of AI models through a single endpoint, while automatically handling fallbacks and selecting the most cost-effective options. Our goal is not merely to provide a single API for accessing multiple models, but also to focus on multimodality and enable convenient usage of today's popular open-source AI or agent applications and tools with just one key.
Using the OpenAI SDK
- TypeScript
- Python
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'https://knox.chat/v1',
apiKey: '<KNOXCHAT_API_KEY>',
});
async function main() {
const completion = await openai.chat.completions.create({
model: 'openai/gpt-5',
messages: [
{
role: 'user',
content: 'What is the meaning of life?',
},
],
});
console.log(completion.choices[0].message);
}
main();
from openai import OpenAI
client = OpenAI(
base_url="https://knox.chat/v1",
api_key="<KNOXCHAT_API_KEY>",
)
completion = client.chat.completions.create(
model="openai/gpt-5",
messages=[
{
"role": "user",
"content": "What is the meaning of life?"
}
]
)
print(completion.choices[0].message.content)
Using the Knox.Chat API directly
- Python
- TypeScript
- Shell
import requests
import json
response = requests.post(
url="https://knox.chat/v1/chat/completions",
headers={
"Authorization": "Bearer <KNOXCHAT_API_KEY>",
},
data=json.dumps({
"model": "anthropic/claude-sonnet-4", # Optional
"messages": [
{
"role": "user",
"content": "What is the meaning of life?"
}
]
})
)
fetch('https://knox.chat/v1/chat/completions', {
method: 'POST',
headers: {
Authorization: 'Bearer <KNOXCHAT_API_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-sonnet-4',
messages: [
{
role: 'user',
content: 'What is the meaning of life?',
},
],
}),
});
curl https://knox.chat/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $KNOXCHAT_API_KEY" \
-d '{
"model": "anthropic/claude-sonnet-4",
"messages": [
{
"role": "user",
"content": "What is the meaning of life?"
}
]
}'
The API also supports streaming.
Using third-party SDKs
For information about using third-party SDKs and frameworks with Knox Chat, please see our frameworks documentation.
Principles
Knox Chat helps developers source and optimize AI usage. We believe the future is multimodality and multi-provider.
Why Knox Chat?
Price and Performance. Knox Chat scouts for the best prices, the lowest latencies, and the highest throughput across dozens of providers, and lets you choose how to prioritize them.
Standardized API. No need to change code when switching between models or providers. You can even let your users choose and pay for their own.
Real-World Insights. Be the first to take advantage of new models. See real-world data of how often models are used for different purposes. Keep up to date in our Discord channel.
Consolidated Billing. Simple and transparent billing, regardless of how many providers you use.
Higher Availability. Fallback providers, and automatic, smart routing means your requests still work even when providers go down.
Higher Rate Limits. Knox Chat works directly with providers to provide better rate limits and more throughput.
Models
One API for hundreds of models
Explore and browse 300+ models and providers on our website, or with our API.
Models API Standard
Our Models API makes the most important information about all LLMs freely available as soon as we confirm it.
API Response Schema
The Models API returns a standardized JSON response format that provides comprehensive metadata for each available model. This schema is cached at the edge and designed for reliable integration for production applications.
Root Response Object
{
"data": [
/* Array of Model objects */
]
}
Model Object Schema
Each model in the data
array contains the following standardized fields:
Field | Type | Description |
---|---|---|
id | string | Unique model identifier used in API requests (e.g., "google/gemini-2.5-pro" ) |
object | string | Object type identifier (always "model" ) |
created | number | Unix timestamp of when the model was created |
owned_by | string | Organization that owns the model |
permission | ModelPermission[] | Array of permission objects defining access controls |
root | string | Root model identifier |
parent | string | null | Parent model identifier if this is a fine-tuned version |
context_length | number | Maximum context window size in tokens |
architecture | Architecture | Object describing the model's technical capabilities |
pricing | Pricing | Price structure for using this model |
top_provider | TopProvider | Configuration details for the primary provider |
supported_parameters | string[] | Array of supported API parameters for this model |
Architecture Object
{
"modality": string, // High-level description of input/output flow (e.g., "text+image->text")
"input_modalities": string[], // Supported input types: ["file", "image", "text", "audio"]
"output_modalities": string[], // Supported output types: ["text"]
"tokenizer": string // Tokenization method used (e.g., "Gemini")
}
Pricing Object
All pricing values are in USD per token/request/unit. A value of "0"
indicates the feature is free.
{
"prompt": string, // Cost per input token
"completion": string, // Cost per output token
"request": string, // Fixed cost per API request
"image": string, // Cost per image input
"audio": string, // Cost per audio input
"web_search": string, // Cost per web search operation
"internal_reasoning": string, // Cost for internal reasoning tokens
"input_cache_read": string, // Cost per cached input token read
"input_cache_write": string // Cost per cached input token write
}
Top Provider Object
{
"context_length": number, // Provider-specific context limit
"max_completion_tokens": number // Maximum tokens in response
}
Supported Parameters
The supported_parameters
array indicates which OpenAI-compatible parameters work with each model:
include_reasoning
- Include reasoning in responsemax_tokens
- Response length limitingreasoning
- Internal reasoning moderesponse_format
- Output format specificationseed
- Deterministic outputsstop
- Custom stop sequencesstructured_outputs
- JSON schema enforcementtemperature
- Randomness controltool_choice
- Tool selection controltools
- Function calling capabilitiestop_p
- Nucleus sampling
Different models use different tokenization methods (as indicated by the tokenizer
field in the model schema). Some models break up text into chunks of multiple characters (GPT, Claude, Llama, etc), while others tokenize differently (like Gemini). This means that token counts (and therefore costs) will vary between models, even when inputs and outputs are the same. Costs are displayed and billed according to the tokenizer for the model in use. You can use the usage
field in API responses to get the actual token counts for your input and output.
Frequently Asked Questions
Getting started
Models and Providers
API Technical Specifications
Privacy and Data Logging
Please see our Terms of Service and Privacy Policy.