Message Transforms

To help address situations where prompts exceed the model's maximum context length, Knox Chat supports a custom parameter called transforms:

{
  transforms: ["middle-out"],// Compress prompts that are > context size.
  messages: [...],
  model // Works with any model
}

This is particularly useful for cases where perfect recall is not necessary. The transform works by removing or truncating messages from the middle of the prompt until it fits within the model's context window.

In some cases, the issue is not the token context length but the actual number of messages. The transform also addresses this: for example, Anthropic's Claude model enforces a maximum limit of {anthropicMaxMessagesCount} messages. When this limit is exceeded with middle-out enabled, the transform will retain half of the messages from the beginning and half from the end of the conversation.

When middle-out compression is enabled, Knox Chat will first attempt to find a model with a context length that is at least half of the total required tokens (prompt input + generated output). For example, if your prompt requires a total of 10,000 tokens, models with a context length of at least 5,000 will be considered. If no model meets this condition, Knox Chat will fall back to using the model with the longest available context length.

The compression will then attempt to fit your content into the selected model's context window by removing or truncating content from the middle of the prompt. If middle-out compression is disabled and your total token count exceeds the model's context length, the request will fail with an error message suggesting that you either shorten the length or enable middle-out compression.

Note

All Knox Chat endpoints with a context length no greater than 8k (8,192 tokens) will use middle-out by default. To disable this feature, set transforms: [] in the request body.

The middle part of the prompt is compressed because LLMs pay less attention to the middle of sequences.