Images and PDFs
Knox Chat supports sending images and PDF files via the API. This article will show you how to use our API to handle these two file types.
Images and PDF files can also be used for interaction in chats.
Image Input
For multimodal models, requests with images can be implemented using the /v1/chat/completions
API, requiring a multipart form format for the messages
parameter. The image_url
can be a URL or a base64-encoded image. Note that multiple images can be sent by adding multiple entries in the content array. The number of images that can be sent in a single request varies depending on the provider and model. Due to the way content is parsed, we recommend sending the text prompt first, followed by the images. If images must be sent first, it is advisable to include them in the system prompt.
Using Image URLs
Here’s how to send an image using a URL:
- Python
- TypeScript
import requests
import json
url = "https://knox.chat/v1/chat/completions"
headers = {
"Authorization": f"Bearer {API_KEY_REF}",
"Content-Type": "application/json"
}
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
]
payload = {
"model": "google/gemini-2.5-flash",
"messages": messages
}
response = requests.post(url, headers=headers, json=payload)
print(response.json())
const response = await fetch('https://knox.chat/v1/chat/completions', {
method: 'POST',
headers: {
Authorization: `Bearer <KNOXCHAT_API_KEY>`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'google/gemini-2.5-flash',
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: "What's in this image?",
},
{
type: 'image_url',
image_url: {
url: 'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg',
},
},
],
},
],
}),
});
const data = await response.json();
console.log(data);
Using Base64-encoded images
For locally stored images, you can use Base64 encoding for transmission. The specific steps are as follows:
- Python
- TypeScript
import requests
import json
import base64
from pathlib import Path
def encode_image_to_base64(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
url = "https://knox.chat/v1/chat/completions"
headers = {
"Authorization": f"Bearer {API_KEY_REF}",
"Content-Type": "application/json"
}
# Read and encode the image
image_path = "path/to/your/image.jpg"
base64_image = encode_image_to_base64(image_path)
data_url = f"data:image/jpeg;base64,{base64_image}"
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": data_url
}
}
]
}
]
payload = {
"model": "google/gemini-2.5-flash",
"messages": messages
}
response = requests.post(url, headers=headers, json=payload)
print(response.json())
async function encodeImageToBase64(imagePath: string): Promise<string> {
const imageBuffer = await fs.promises.readFile(imagePath);
const base64Image = imageBuffer.toString('base64');
return `data:image/jpeg;base64,${base64Image}`;
}
// Read and encode the image
const imagePath = 'path/to/your/image.jpg';
const base64Image = await encodeImageToBase64(imagePath);
const response = await fetch('https://knox.chat/v1/chat/completions', {
method: 'POST',
headers: {
Authorization: `Bearer ${API_KEY_REF}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: '{{MODEL}}',
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: "What's in this image?",
},
{
type: 'image_url',
image_url: {
url: base64Image,
},
},
],
},
],
}),
});
const data = await response.json();
console.log(data);
Supported image content types include:
image/png
image/jpeg
image/webp
PDF Support
Knox Chat provides PDF processing capabilities through the /v1/chat/completions
API. PDF files can be sent as base64-encoded data URLs in the message array via the file content type. This feature is available for any model on Knox Chat.
If the model natively supports file input, the PDF will be passed directly to the model. If the model does not natively support file input, Knox Chat will parse the file and pass the parsed results to the requested model.
Note that multiple PDFs can be sent as separate content array entries. The number of PDFs that can be sent in a single request depends on the service provider and model. Due to differences in content parsing methods, we recommend sending text prompts first before sending PDFs. If PDFs must be sent first, it is advisable to include them in the system prompt.
Plugin Configuration
To configure PDF processing functionality, use the plugins
parameter in the request. Knox Chat offers multiple PDF processing engines with varying features and pricing:
{
plugins: [
{
id: 'file-parser',
pdf: {
engine: 'pdf-text', // or 'mistral-ocr' or 'native'
},
},
],
}
Pricing
Knox Chat offers multiple PDF processing engines:
"mistral-ocr"
: Suitable for scanned documents or PDFs containing images (costs $2 per 1000 pages)."pdf-text"
: Suitable for well-structured, clearly text-based PDFs (free)."native"
: Only applicable to models that natively support file inputs (billed per input token).
If no engine is explicitly specified, Knox Chat will prioritize the model's native file handling capability. If unavailable, it defaults to the "mistral-ocr"
engine.
Processing PDFs
Here’s how to send and process a PDF:
- Python
- TypeScript
import requests
import json
import base64
from pathlib import Path
def encode_pdf_to_base64(pdf_path):
with open(pdf_path, "rb") as pdf_file:
return base64.b64encode(pdf_file.read()).decode('utf-8')
url = "https://knox.chat/v1/chat/completions"
headers = {
"Authorization": f"Bearer {API_KEY_REF}",
"Content-Type": "application/json"
}
# Read and encode the PDF
pdf_path = "path/to/your/document.pdf"
base64_pdf = encode_pdf_to_base64(pdf_path)
data_url = f"data:application/pdf;base64,{base64_pdf}"
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What are the main points in this document?"
},
{
"type": "file",
"file": {
"filename": "document.pdf",
"file_data": data_url
}
},
]
}
]
# Optional: Configure PDF processing engine
# PDF parsing will still work even if the plugin is not explicitly set
plugins = [
{
"id": "file-parser",
"pdf": {
"engine": "pdf-text" # defaults to "mistral-ocr". See Pricing above
}
}
]
payload = {
"model": "google/gemma-3-27b-it",
"messages": messages,
"plugins": plugins
}
response = requests.post(url, headers=headers, json=payload)
print(response.json())
async function encodePDFToBase64(pdfPath: string): Promise<string> {
const pdfBuffer = await fs.promises.readFile(pdfPath);
const base64PDF = pdfBuffer.toString('base64');
return `data:application/pdf;base64,${base64PDF}`;
}
// Read and encode the PDF
const pdfPath = 'path/to/your/document.pdf';
const base64PDF = await encodePDFToBase64(pdfPath);
const response = await fetch('https://knox.chat/v1/chat/completions', {
method: 'POST',
headers: {
Authorization: `Bearer ${API_KEY_REF}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: '{{MODEL}}',
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: 'What are the main points in this document?',
},
{
type: 'file',
file: {
filename: 'document.pdf',
file_data: base64PDF,
},
},
],
},
],
// Optional: Configure PDF processing engine
// PDF parsing will still work even if the plugin is not explicitly set
plugins: [
{
id: 'file-parser',
pdf: {
engine: '{{ENGINE}}', // defaults to "{{DEFAULT_PDF_ENGINE}}". See Pricing above
},
},
],
}),
});
const data = await response.json();
console.log(data);
Skip Parsing Costs
When you send a PDF file to the API, the response may include file annotations in the assistant's messages. These annotations record the structured information of the parsed PDF document. If you resend these annotations in subsequent requests, you can avoid parsing the same PDF document multiple times, saving processing time and costs.
Here’s how to reuse file annotations:
- Python
- TypeScript
import requests
import json
import base64
from pathlib import Path
# First, encode and send the PDF
def encode_pdf_to_base64(pdf_path):
with open(pdf_path, "rb") as pdf_file:
return base64.b64encode(pdf_file.read()).decode('utf-8')
url = "https://knox.chat/v1/chat/completions"
headers = {
"Authorization": f"Bearer {API_KEY_REF}",
"Content-Type": "application/json"
}
# Read and encode the PDF
pdf_path = "path/to/your/document.pdf"
base64_pdf = encode_pdf_to_base64(pdf_path)
data_url = f"data:application/pdf;base64,{base64_pdf}"
# Initial request with the PDF
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What are the main points in this document?"
},
{
"type": "file",
"file": {
"filename": "document.pdf",
"file_data": data_url
}
},
]
}
]
payload = {
"model": "google/gemma-3-27b-it",
"messages": messages
}
response = requests.post(url, headers=headers, json=payload)
response_data = response.json()
# Store the annotations from the response
file_annotations = None
if response_data.get("choices") and len(response_data["choices"]) > 0:
if "annotations" in response_data["choices"][0]["message"]:
file_annotations = response_data["choices"][0]["message"]["annotations"]
# Follow-up request using the annotations (without sending the PDF again)
if file_annotations:
follow_up_messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What are the main points in this document?"
},
{
"type": "file",
"file": {
"filename": "document.pdf",
"file_data": data_url
}
}
]
},
{
"role": "assistant",
"content": "The document contains information about...",
"annotations": file_annotations
},
{
"role": "user",
"content": "Can you elaborate on the second point?"
}
]
follow_up_payload = {
"model": "google/gemma-3-27b-it",
"messages": follow_up_messages
}
follow_up_response = requests.post(url, headers=headers, json=follow_up_payload)
print(follow_up_response.json())
import fs from 'fs/promises';
import { fetch } from 'node-fetch';
async function encodePDFToBase64(pdfPath: string): Promise<string> {
const pdfBuffer = await fs.readFile(pdfPath);
const base64PDF = pdfBuffer.toString('base64');
return `data:application/pdf;base64,${base64PDF}`;
}
// Initial request with the PDF
async function processDocument() {
// Read and encode the PDF
const pdfPath = 'path/to/your/document.pdf';
const base64PDF = await encodePDFToBase64(pdfPath);
const initialResponse = await fetch(
'https://knox.chat/v1/chat/completions',
{
method: 'POST',
headers: {
Authorization: `Bearer ${API_KEY_REF}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: '{{MODEL}}',
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: 'What are the main points in this document?',
},
{
type: 'file',
file: {
filename: 'document.pdf',
file_data: base64PDF,
},
},
],
},
],
}),
},
);
const initialData = await initialResponse.json();
// Store the annotations from the response
let fileAnnotations = null;
if (initialData.choices && initialData.choices.length > 0) {
if (initialData.choices[0].message.annotations) {
fileAnnotations = initialData.choices[0].message.annotations;
}
}
// Follow-up request using the annotations (without sending the PDF again)
if (fileAnnotations) {
const followUpResponse = await fetch(
'https://knox.chat/v1/chat/completions',
{
method: 'POST',
headers: {
Authorization: `Bearer ${API_KEY_REF}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: '{{MODEL}}',
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: 'What are the main points in this document?',
},
{
type: 'file',
file: {
filename: 'document.pdf',
file_data: base64PDF,
},
},
],
},
{
role: 'assistant',
content: 'The document contains information about...',
annotations: fileAnnotations,
},
{
role: 'user',
content: 'Can you elaborate on the second point?',
},
],
}),
},
);
const followUpData = await followUpResponse.json();
console.log(followUpData);
}
}
processDocument();
When you include file comments from previous responses in subsequent requests, Knox Chat will directly use this pre-parsed information
instead of re-parsing the PDF file, which saves processing time and cost. This mechanism is particularly beneficial for large documents
or when using the mistral-ocr
engine, which incurs additional costs.
Response Format
The API will return a response in the following format:
{
"id": "gen-1234567890",
"provider": "DeepInfra",
"model": "google/gemma-3-27b-it",
"object": "chat.completion",
"created": 1234567890,
"choices": [
{
"message": {
"role": "assistant",
"content": "The document discusses..."
}
}
],
"usage": {
"prompt_tokens": 1000,
"completion_tokens": 100,
"total_tokens": 1100
}
}