Skip to main content

Understanding Rate Limits

Rate limits define the number of API requests that can be made within a specific time period, helping optimize API usage.
  • Prevent API abuse and misuse
  • Ensure fair resource allocation
  • Maintain API performance and reliability
  • Protect service stability

Default Rate Limits

Each account has default rate limits when calling models, measured in RPM (requests per minute per model) and TPM (tokens per minute per model). Rate limits vary by account tier, as shown in the table below.
Quota TierQualification (USD)
T1Highest single-month top-up amount in the last 3 calendar months < $50
T2$50 ≤ Highest single-month top-up amount in the last 3 calendar months < $500
T3$500 ≤ Highest single-month top-up amount in the last 3 calendar months < $3000
T4$3000 ≤ Highest single-month top-up amount in the last 3 calendar months < $10000
T5$10000 ≤ Highest single-month top-up amount in the last 3 calendar months
Default rate limits (RPM / TPM) for each tier:

Avoiding Rate Limit Triggers

If the number of your API requests exceeds the rate limit, the API will return:
  • HTTP status code: 429 (Too Many Requests).
  • The response body will contain rate limit exceeded information.
To avoid triggering rate limits, you can take the following measures:
  • Implement request throttling in your application.
  • Use exponential backoff when retrying.
  • Monitor your API usage.

Handling 429 Errors

If you receive a 429 error, you can try the following:
  • Retry later: Wait for a period before retrying your request.
  • Optimize requests: Reduce request frequency.
  • Increase rate limits: If you need higher rate limits, please contact us.