Semantic Cache
The semantic cache checks whether a similar query has already been answered.
When a query clears the similarity threshold, a client can:
- Reuse the cached answer
- Avoid generating a fresh response
Why it matters
AI coding workflows often repeat the same questions with different wording. A semantic cache helps by:
- Catching repeated intent
- Avoiding exact-string matching requirements
- Reducing unnecessary token spend
Examples:
- "How does auth work?"
- "What is our authentication strategy?"
- "Where do we store auth tokens?"
These can all point to the same cached context when the underlying meaning is close enough.
Default behavior
The default similarity threshold is configured in the Agent Brain environment. The README documents CACHE_SIMILARITY_THRESHOLD=0.85 as the default.
Tradeoff:
- Higher thresholds reduce false positives
- Lower thresholds increase hit rate
- Lower thresholds require more trust in semantic similarity
Monitoring
Use the desktop Cache Monitor to watch:
- Hits
- Misses
- Similarity scores
- Token-savings estimates