Navigation Menu

Command Palette

Search for a command to run...

00:00:00

GitHub LinkedIn

Sections

Introduction About Me Projects Skills & Tools Education Contact

Dev Jhawar © 2026

Developed with ❤️

Back to projects

LLM Semantic Cache

A semantic caching layer for Large Language Models to reduce latency and API costs.

LLM Semantic Cache improves the efficiency of LLM applications by caching and retrieving semantically similar queries. It reduces redundant API calls, saving costs and significantly decreasing response times.

Tech Stack

PythonRedisSentence TransformersFastAPI

Features

Semantic similarity matching
Configurable caching strategies
Integration with popular LLM providers
Low-latency retrieval

Challenges

Optimizing embedding generation for speed
Managing cache eviction policies

Feedback

For feedback or suggestions, contact me at: dev.jhawar.cs@gmail.com

Links

devjhawar