Anthropic
Anthropic models support explicit Prompt caching. On this platform, whether using the OpenAI chat/completions protocol or the Anthropic v1/messages protocol, you can use"cache_control": {"type": "ephemeral"} to specify content to be cached.
- Claude Opus 4.1, Claude Opus 4, Claude Sonnet 4.5, Claude Sonnet 4, Claude Sonnet 3.7: 1024 tokens
- Claude Haiku 4.5, Claude Haiku 3.5, and Claude Haiku 3: 2048 tokens
OpenAI and OpenAI-Compatible Models
Typically, these models may support implicit caching. When users repeatedly access the same model with the same Prompt prefix, there is a chance of hitting the cache.Gemini
Currently only implicit caching is supported. Implicit caching requires no manual setup or additional cache_control configuration. When users repeatedly access the same model with the same Prompt prefix, there is a chance of hitting the cache. Notes:- The average TTL (cache time-to-live) is 3-5 minutes, but it may vary (e.g., it could be only a few seconds)
- Gemini 2.5 Flash requires a minimum input of 1024 tokens, and Gemini 2.5 Pro requires a minimum of 4096 tokens