Cost and performance
Optimize token usage, polling strategy, and model selection for production video generation.
Model selection
Orchestration models have different per-token rates. Compare pricing on Engines or via GET /v1/engines:
- Lower-cost models — suitable for simple animations and short prompts
- Higher-capability models — better for complex compositions but higher token cost
Choose the model that matches your quality requirements without over-provisioning.
Prompt efficiency
Token usage scales with prompt complexity:
- Keep text prompts focused — describe the desired output, not implementation details
- Use reference images instead of lengthy text descriptions when possible
- Avoid redundant
instructionsor system input items that repeat guidance already in the user prompt
Polling vs. webhooks
| Approach | Cost | Latency to notification |
|---|---|---|
| Polling every 2s | ~30 API calls per minute per generation | 2-second granularity |
| Webhooks | 0 poll calls | Near-instant on completion |
For production workloads, webhooks eliminate polling overhead and reduce rate limit consumption.
Rate limit budget
Each API key is limited to 100 requests per minute. At 2-second polling intervals, a single active generation uses ~30 requests per minute. Plan for concurrent generations:
- 3 concurrent generations at 2s polling ≈ 90 req/min (near limit)
- Webhooks remove this constraint entirely for completion detection
Canvas dimensions
Larger canvases (width × height) may increase orchestration complexity and token usage. Use the smallest dimensions that meet your output requirements:
| Use case | Suggested dimensions |
|---|---|
| Social vertical | 1080 × 1920 (default) |
| Landscape | 1920 × 1080 |
| Square | 1080 × 1080 |
Monitoring spend
- Check
credit_balanceviaGET /v1/usageor the dashboard - Review
daily_usage_by_routeto identify high-cost model combinations - Set internal alerts when balance drops below your team's threshold
Caching and deduplication
Pipevideo does not cache identical generation requests. If your application may submit duplicate prompts:
- Implement client-side deduplication before calling the API
- Store completed generation results keyed by prompt hash
- Use webhooks with idempotent handlers (deduplicate by
generationId)
Related
- Billing — credits and top-ups
- Usage — monitoring consumption
- Rate limits — API throttling