Understanding Rate Limits
Rate limits define the number of API requests that can be made within a specific time period, helping optimize API usage.- Prevent API abuse and misuse
- Ensure fair resource allocation
- Maintain API performance and reliability
- Protect service stability
Default Rate Limits
Each account has default rate limits when calling models, measured in RPM (requests per minute per model) and TPM (tokens per minute per model). Rate limits vary by account tier, as shown in the table below.| Quota Tier | Qualification (USD) |
|---|---|
| T1 | Highest single-month top-up amount in the last 3 calendar months < $50 |
| T2 | $50 ≤ Highest single-month top-up amount in the last 3 calendar months < $500 |
| T3 | $500 ≤ Highest single-month top-up amount in the last 3 calendar months < $3000 |
| T4 | $3000 ≤ Highest single-month top-up amount in the last 3 calendar months < $10000 |
| T5 | $10000 ≤ Highest single-month top-up amount in the last 3 calendar months |
Avoiding Rate Limit Triggers
If the number of your API requests exceeds the rate limit, the API will return:- HTTP status code: 429 (Too Many Requests).
- The response body will contain rate limit exceeded information.
- Implement request throttling in your application.
- Use exponential backoff when retrying.
- Monitor your API usage.
Handling 429 Errors
If you receive a 429 error, you can try the following:- Retry later: Wait for a period before retrying your request.
- Optimize requests: Reduce request frequency.
- Increase rate limits: If you need higher rate limits, please contact us.