2 Comments
Nov 22, 2023Liked by Sairam Sundaresan

This is a terrific explanation - thank you. When it comes to "non-parametric memory", is this typically prefetched and the LLM sees this as a local cache of recent information or is this a real time lookup based on the user's query?

Expand full comment
author

Thanks! I'm glad you found this useful. I think the answer depends. In most scenarios, I'd think this would be a real-time look up since it's not possible to predict what the exact words in the user's query will be, right? But it seems like this is an active area of research. I just came across this post that talks of a "semantic" cache to speed up LLMs: https://portkey.ai/blog/reducing-llm-costs-and-latency-semantic-cache/

Expand full comment