Cost and performance

Optimize token usage, polling strategy, and model selection for production video generation.

Model selection

Orchestration models have different per-token rates. Compare pricing on Engines or via GET /v1/engines:

Lower-cost models — suitable for simple animations and short prompts
Higher-capability models — better for complex compositions but higher token cost

Choose the model that matches your quality requirements without over-provisioning.

Prompt efficiency

Token usage scales with prompt complexity:

Keep text prompts focused — describe the desired output, not implementation details
Use reference images instead of lengthy text descriptions when possible
Avoid redundant instructions or system input items that repeat guidance already in the user prompt

Polling vs. webhooks

Approach	Cost	Latency to notification
Polling every 2s	~30 API calls per minute per generation	2-second granularity
Webhooks	0 poll calls	Near-instant on completion

For production workloads, webhooks eliminate polling overhead and reduce rate limit consumption.

Rate limit budget

Each API key is limited to 100 requests per minute. At 2-second polling intervals, a single active generation uses ~30 requests per minute. Plan for concurrent generations:

3 concurrent generations at 2s polling ≈ 90 req/min (near limit)
Webhooks remove this constraint entirely for completion detection

Canvas dimensions

Larger canvases (width × height) may increase orchestration complexity and token usage. Use the smallest dimensions that meet your output requirements:

Use case	Suggested dimensions
Social vertical	1080 × 1920 (default)
Landscape	1920 × 1080
Square	1080 × 1080

Monitoring spend

Check credit_balance via GET /v1/usage or the dashboard
Review daily_usage_by_route to identify high-cost model combinations
Set internal alerts when balance drops below your team's threshold

Caching and deduplication

Pipevideo does not cache identical generation requests. If your application may submit duplicate prompts:

Implement client-side deduplication before calling the API
Store completed generation results keyed by prompt hash
Use webhooks with idempotent handlers (deduplicate by data.object.id)

Billing — credits and top-ups
Usage — monitoring consumption
Rate limits — API throttling

Cost and performance

On this page