Skip to main content

Images and PDFs

Knox Chat supports sending images and PDF files via the API. This article will show you how to use our API to handle these two file types.

Images and PDF files can also be used for interaction in chats.

Image Input

For multimodal models, requests with images can be implemented using the /v1/chat/completions API, requiring a multipart form format for the messages parameter. The image_url can be a URL or a base64-encoded image. Note that multiple images can be sent by adding multiple entries in the content array. The number of images that can be sent in a single request varies depending on the provider and model. Due to the way content is parsed, we recommend sending the text prompt first, followed by the images. If images must be sent first, it is advisable to include them in the system prompt.

Using Image URLs

Here’s how to send an image using a URL:

import requests
import json

url = "https://knox.chat/v1/chat/completions"
headers = {
"Authorization": f"Bearer {API_KEY_REF}",
"Content-Type": "application/json"
}

messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
]

payload = {
"model": "google/gemini-2.5-flash",
"messages": messages
}

response = requests.post(url, headers=headers, json=payload)
print(response.json())

Using Base64-encoded images

For locally stored images, you can use Base64 encoding for transmission. The specific steps are as follows:

import requests
import json
import base64
from pathlib import Path

def encode_image_to_base64(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')

url = "https://knox.chat/v1/chat/completions"
headers = {
"Authorization": f"Bearer {API_KEY_REF}",
"Content-Type": "application/json"
}

# Read and encode the image
image_path = "path/to/your/image.jpg"
base64_image = encode_image_to_base64(image_path)
data_url = f"data:image/jpeg;base64,{base64_image}"

messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": data_url
}
}
]
}
]

payload = {
"model": "google/gemini-2.5-flash",
"messages": messages
}

response = requests.post(url, headers=headers, json=payload)
print(response.json())

Supported image content types include:

  • image/png
  • image/jpeg
  • image/webp

PDF Support

Knox Chat provides PDF processing capabilities through the /v1/chat/completions API. PDF files can be sent as base64-encoded data URLs in the message array via the file content type. This feature is available for any model on Knox Chat.

info

If the model natively supports file input, the PDF will be passed directly to the model. If the model does not natively support file input, Knox Chat will parse the file and pass the parsed results to the requested model.

Note that multiple PDFs can be sent as separate content array entries. The number of PDFs that can be sent in a single request depends on the service provider and model. Due to differences in content parsing methods, we recommend sending text prompts first before sending PDFs. If PDFs must be sent first, it is advisable to include them in the system prompt.

Plugin Configuration

To configure PDF processing functionality, use the plugins parameter in the request. Knox Chat offers multiple PDF processing engines with varying features and pricing:

{
plugins: [
{
id: 'file-parser',
pdf: {
engine: 'pdf-text', // or 'mistral-ocr' or 'native'
},
},
],
}

Pricing

Knox Chat offers multiple PDF processing engines:

  1. "mistral-ocr": Suitable for scanned documents or PDFs containing images (costs $2 per 1000 pages).
  2. "pdf-text": Suitable for well-structured, clearly text-based PDFs (free).
  3. "native": Only applicable to models that natively support file inputs (billed per input token).

If no engine is explicitly specified, Knox Chat will prioritize the model's native file handling capability. If unavailable, it defaults to the "mistral-ocr" engine.

Processing PDFs

Here’s how to send and process a PDF:

import requests
import json
import base64
from pathlib import Path

def encode_pdf_to_base64(pdf_path):
with open(pdf_path, "rb") as pdf_file:
return base64.b64encode(pdf_file.read()).decode('utf-8')

url = "https://knox.chat/v1/chat/completions"
headers = {
"Authorization": f"Bearer {API_KEY_REF}",
"Content-Type": "application/json"
}

# Read and encode the PDF
pdf_path = "path/to/your/document.pdf"
base64_pdf = encode_pdf_to_base64(pdf_path)
data_url = f"data:application/pdf;base64,{base64_pdf}"

messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What are the main points in this document?"
},
{
"type": "file",
"file": {
"filename": "document.pdf",
"file_data": data_url
}
},
]
}
]

# Optional: Configure PDF processing engine
# PDF parsing will still work even if the plugin is not explicitly set
plugins = [
{
"id": "file-parser",
"pdf": {
"engine": "pdf-text" # defaults to "mistral-ocr". See Pricing above
}
}
]

payload = {
"model": "google/gemma-3-27b-it",
"messages": messages,
"plugins": plugins
}

response = requests.post(url, headers=headers, json=payload)
print(response.json())

Skip Parsing Costs

When you send a PDF file to the API, the response may include file annotations in the assistant's messages. These annotations record the structured information of the parsed PDF document. If you resend these annotations in subsequent requests, you can avoid parsing the same PDF document multiple times, saving processing time and costs.

Here’s how to reuse file annotations:

import requests
import json
import base64
from pathlib import Path

# First, encode and send the PDF
def encode_pdf_to_base64(pdf_path):
with open(pdf_path, "rb") as pdf_file:
return base64.b64encode(pdf_file.read()).decode('utf-8')

url = "https://knox.chat/v1/chat/completions"
headers = {
"Authorization": f"Bearer {API_KEY_REF}",
"Content-Type": "application/json"
}

# Read and encode the PDF
pdf_path = "path/to/your/document.pdf"
base64_pdf = encode_pdf_to_base64(pdf_path)
data_url = f"data:application/pdf;base64,{base64_pdf}"

# Initial request with the PDF
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What are the main points in this document?"
},
{
"type": "file",
"file": {
"filename": "document.pdf",
"file_data": data_url
}
},
]
}
]

payload = {
"model": "google/gemma-3-27b-it",
"messages": messages
}

response = requests.post(url, headers=headers, json=payload)
response_data = response.json()

# Store the annotations from the response
file_annotations = None
if response_data.get("choices") and len(response_data["choices"]) > 0:
if "annotations" in response_data["choices"][0]["message"]:
file_annotations = response_data["choices"][0]["message"]["annotations"]

# Follow-up request using the annotations (without sending the PDF again)
if file_annotations:
follow_up_messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What are the main points in this document?"
},
{
"type": "file",
"file": {
"filename": "document.pdf",
"file_data": data_url
}
}
]
},
{
"role": "assistant",
"content": "The document contains information about...",
"annotations": file_annotations
},
{
"role": "user",
"content": "Can you elaborate on the second point?"
}
]

follow_up_payload = {
"model": "google/gemma-3-27b-it",
"messages": follow_up_messages
}

follow_up_response = requests.post(url, headers=headers, json=follow_up_payload)
print(follow_up_response.json())
info

When you include file comments from previous responses in subsequent requests, Knox Chat will directly use this pre-parsed information
instead of re-parsing the PDF file, which saves processing time and cost. This mechanism is particularly beneficial for large documents
or when using the mistral-ocr engine, which incurs additional costs.

Response Format

The API will return a response in the following format:

{
"id": "gen-1234567890",
"provider": "DeepInfra",
"model": "google/gemma-3-27b-it",
"object": "chat.completion",
"created": 1234567890,
"choices": [
{
"message": {
"role": "assistant",
"content": "The document discusses..."
}
}
],
"usage": {
"prompt_tokens": 1000,
"completion_tokens": 100,
"total_tokens": 1100
}
}