Limits
Workers AI is now Generally Available. We’ve updated our rate limits to reflect this.
Note that model inferences in local mode using Wrangler will also count towards these limits. Beta models may have lower rate limits while we work on performance and scale.
Rate limits are default per task type, with some per-model limits defined as follows:
- 720 requests per minute
- 3000 requests per minute
- 720 requests per minute
- 3000 requests per minute
- 1500 requests per minute
- 2000 requests per minute
- 3000 requests per minute
- @cf/baai/bge-large-en-v1.5 is 1500 requests per minute
- 300 requests per minute
- @hf/thebloke/mistral-7b-instruct-v0.1-awq is 400 requests per minute
- @cf/microsoft/phi-2 is 720 requests per minute
- @cf/qwen/qwen1.5-0.5b-chat is 1500 requests per minute
- @cf/qwen/qwen1.5-1.8b-chat is 720 requests per minute
- @cf/qwen/qwen1.5-14b-chat-awq is 150 requests per minute
- @cf/tinyllama/tinyllama-1.1b-chat-v1.0 is 720 requests per minute
- 720 requests per minute
- @cf/runwayml/stable-diffusion-v1-5-img2img is 1500 requests per minute
- 720 requests per minute