Semantic Cache
The semantic cache checks whether a similar query has already been answered. When a query clears the similarity threshold, a client can reuse the cached answer instead of generating a fresh response.
Why it matters
AI coding workflows often repeat the same questions with different wording. A semantic cache catches that repetition without requiring exact string matches.
Examples:
- "How does auth work?"
- "What is our authentication strategy?"
- "Where do we store auth tokens?"
These can all point to the same cached context if the underlying meaning is close enough.
Default behavior
The default similarity threshold is configured in the Agent Brain environment. The README documents CACHE_SIMILARITY_THRESHOLD=0.85 as the default.
Higher thresholds reduce false positives. Lower thresholds increase hit rate but require more trust in semantic similarity.
Monitoring
Use the desktop Cache Monitor to watch hits, misses, similarity scores, and token-savings estimates during real usage.