Cost and performance

Optimize token usage, polling strategy, and model selection for production video generation.

Model selection

Orchestration models have different per-token rates. Compare pricing on Engines or via GET /v1/engines:

  • Lower-cost models — suitable for simple animations and short prompts
  • Higher-capability models — better for complex compositions but higher token cost

Choose the model that matches your quality requirements without over-provisioning.

Prompt efficiency

Token usage scales with prompt complexity:

  • Keep text prompts focused — describe the desired output, not implementation details
  • Use reference images instead of lengthy text descriptions when possible
  • Avoid redundant instructions or system input items that repeat guidance already in the user prompt

Polling vs. webhooks

ApproachCostLatency to notification
Polling every 2s~30 API calls per minute per generation2-second granularity
Webhooks0 poll callsNear-instant on completion

For production workloads, webhooks eliminate polling overhead and reduce rate limit consumption.

Rate limit budget

Each API key is limited to 100 requests per minute. At 2-second polling intervals, a single active generation uses ~30 requests per minute. Plan for concurrent generations:

  • 3 concurrent generations at 2s polling ≈ 90 req/min (near limit)
  • Webhooks remove this constraint entirely for completion detection

Canvas dimensions

Larger canvases (width × height) may increase orchestration complexity and token usage. Use the smallest dimensions that meet your output requirements:

Use caseSuggested dimensions
Social vertical1080 × 1920 (default)
Landscape1920 × 1080
Square1080 × 1080

Monitoring spend

  • Check credit_balance via GET /v1/usage or the dashboard
  • Review daily_usage_by_route to identify high-cost model combinations
  • Set internal alerts when balance drops below your team's threshold

Caching and deduplication

Pipevideo does not cache identical generation requests. If your application may submit duplicate prompts:

  • Implement client-side deduplication before calling the API
  • Store completed generation results keyed by prompt hash
  • Use webhooks with idempotent handlers (deduplicate by generationId)