Compress
Compress
Compress
Compress a prompt and context to reduce token usage while preserving semantic meaning.
POST
Compress
Overview
The/compress/raw/ endpoint compresses your prompt and context, reducing token count while maintaining the semantic integrity needed for high-quality AI responses. Compression ratios of 50–70% are typical, with no meaningful degradation in downstream model output quality.
Request
Background information, instructions, or supporting text that provides context for the prompt. This is the content most aggressively compressed — structure and meaning are preserved, but redundancy is removed.
The main query or question to send to your AI model. Kept intact where possible to preserve intent.
Compression configuration.
Response
The compressed output, ready to pass directly to your AI model in place of the original context and prompt.
Token count of the original input.
Token count of the compressed output.
Whether the compression completed successfully.
End-to-end request latency in milliseconds.
Error responses
| Status | Meaning |
|---|---|
400 Bad Request | Malformed request body or missing required fields. |
401 Unauthorized | Missing or invalid x-api-key. |
429 Too Many Requests | Rate limit exceeded. Back off and retry. |
500 Internal Server Error | Compression service unavailable. |
Authentication
Include your API key in every request using thex-api-key header.
Examples
Auto compression
Fixed compression rate
Pass a number forrate when you need a guaranteed token budget.
Notes
"auto"rate is recommended for most use cases. Fixed rates below0.3may noticeably affect output quality on dense technical content.- The
compressed_promptfield is a single string — pass it as the full prompt to your downstream model, replacing bothcontextandprompt. - Token counts are estimated using the same tokenizer as the target model family. Exact counts may vary slightly depending on the model you use downstream.
Authorizations
Body
application/json