Rate limits

API rate limits and how to handle throttled requests.

Limits

Authenticated API requests are rate-limited to 100 requests per minute per API key.

The limit applies to all authenticated endpoints:

  • POST /v1/responses
  • GET /v1/responses/{id}
  • DELETE /v1/responses/{id}
  • GET /v1/usage

GET /v1/engines is a public endpoint and is not rate-limited per key.

Exceeded limit response

When you exceed the limit, the API returns:

HTTP/1.1 429 Too Many Requests
{
  "message": "Rate limit exceeded. Please try again later.",
  "data": {
    "retry_after": 42
  }
}

The retry_after field indicates seconds until the limit resets.

Best practices

  • Backoff — wait retry_after seconds before retrying.
  • Reduce polling frequency — poll generation status every 2–5 seconds, not every second.
  • Use webhooks — replace polling with webhook delivery for completion notifications.
  • Separate keys — use different API keys for high-throughput services to isolate rate limit buckets.