Rate limiting
Rate limiting controls the traffic that reaches your application, which prevents expensive bills and suspicious activity.
You can define rate limits as the number of requests that get sent in a specific time frame. For example, you can limit your application to 100 requests per 60 seconds.
You can also select if you would like a fixed or sliding rate limiting technique. With rate limiting, we allow a certain number of requests within a window of time. For example, if it is a fixed rate, the window is based on time, so there would be no more than x
requests in a ten minute window. If it is a sliding rate, there would be no more than x
requests in the last ten minutes.
To illustrate this, let us say you had a limit of ten requests per ten minutes, starting at 12:00. So the fixed window is 12:00-12:10, 12:10-12:20, and so on. If you sent ten requests at 12:09 and ten requests at 12:11, all 20 requests would be successful in a fixed window strategy. However, they would fail in a sliding window strategy since there were more than ten requests in the last ten minutes.
When your requests exceed the allowed rate, you’ll encounter rate limiting. This means the server will respond with a 429 Too Many Requests
status code and your request won’t be processed.
To set the default rate limiting configuration in the dashboard:
- Log into the Cloudflare dashboard ↗ and select your account.
- Go to AI > AI Gateway.
- Go to Settings.
- Enable Rate-limiting.
- Adjust the rate, time period, and rate limiting method as desired.
To set the default rate limiting configuration using the API:
- Create an API token with the following permissions:
AI Gateway - Read
AI Gateway - Edit
- Get your Account ID.
- Using that API token and Account ID, send a
POST
request to create a new Gateway and include a value for therate_limiting_interval
,rate_limiting_limit
, andrate_limiting_technique
.
This rate limiting behavior will be uniformly applied to all requests for that gateway.