"Our LLM API bill went from $400 to $3,200 in one month after we launched our chatbot. Looking at the logs, at least 60% of questions are variations of the same 20 things. There has to be a better way."
u/devops_tired · 847 comments
"We're paying $0.008 per call and our support bot answers 'what are your pricing plans?' about 4,000 times a day. That's $32/day for the exact same answer. Caching by string match doesn't work because everyone phrases it differently."
thrwaway_swe · 6 hours ago
"my llm api costs are insane. same question asked 50 different ways = 50 full price calls. 'what's the refund policy' 'can i get a refund' 'do you have refunds' — why is there no semantic caching layer for this lol"
@indie_hacker_io
"Just got my first OpenAI bill for a full month — $1,800. I nearly choked. I built a customer support bot and everyone keeps asking the same FAQ questions. I need to cache LLM responses but exact match caching barely helps because natural language is messy."
u/bootstrapped_dev · 234 comments
"Every time I look at our LLM spend analytics, it's depressing. Tons of near-duplicate queries hitting the API fresh every single time. We tried building our own semantic cache but maintaining embeddings infrastructure is a whole project in itself."
silentcoder_sf · 3 hours ago
"spending $2k/month on gpt-4 for a FAQ bot. the answer to 'how do i cancel' is identical whether someone types 'how do i cancel', 'cancellation process', or 'i want to cancel my subscription'. paying for all three separately is painful"
@yc_founder_23
"LLM API costs are the #1 thing killing our margins. We're a small team, our product is AI-first, and every user query costs money. I've tried prompt caching, exact-match Redis caching... but semantic deduplication? Nobody has a clean solution."
u/ramen_profitable · 119 comments
"Ask HN: How do you handle LLM cost optimization at scale? We're at $15k/month and climbing. Semantic caching seems promising but building it in-house feels like a distraction from our actual product."
building_in_public · 78 comments
"Our LLM API bill went from $400 to $3,200 in one month after we launched our chatbot. Looking at the logs, at least 60% of questions are variations of the same 20 things. There has to be a better way."
u/devops_tired · 847 comments
"We're paying $0.008 per call and our support bot answers 'what are your pricing plans?' about 4,000 times a day. That's $32/day for the exact same answer. Caching by string match doesn't work because everyone phrases it differently."
thrwaway_swe · 6 hours ago
"my llm api costs are insane. same question asked 50 different ways = 50 full price calls. 'what's the refund policy' 'can i get a refund' 'do you have refunds' — why is there no semantic caching layer for this lol"
@indie_hacker_io
"Just got my first OpenAI bill for a full month — $1,800. I nearly choked. I built a customer support bot and everyone keeps asking the same FAQ questions. I need to cache LLM responses but exact match caching barely helps because natural language is messy."
u/bootstrapped_dev · 234 comments
"Every time I look at our LLM spend analytics, it's depressing. Tons of near-duplicate queries hitting the API fresh every single time. We tried building our own semantic cache but maintaining embeddings infrastructure is a whole project in itself."
silentcoder_sf · 3 hours ago