Pearlite · LLM Caching for Developers

Integrate

Drop in. No rewrites.

A direct replacement for your LLM SDK across any language — no new paradigms, no config files. Just wrap and go.

        
import { Pearlite } from 'pearlite'

const client = new Pearlite({
  apiKey: 'pk_live_••••••••',
  similarity: 0.92,  // tune aggressiveness
  provider: 'openai'
})

// Drop-in for OpenAI SDK
const { data, cached, saved } = await client.chat({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: userMessage }]
})

// cached  → true | false
// saved   → '$0.008'
// latency → '11ms'

Live Cache Stream

The Problem

Your AI bill is full of questions
you’ve already answered.

Every semantically identical query costs you full price. Pearlite ends that.

Without Pearlite

10:42:01"What are your pricing plans?"$0.008

10:42:03"how much does it cost?"$0.008

10:42:07"pricing?"$0.008

10:42:09"What are your pricing plans?"$0.008

10:42:12"do you have a free tier?"$0.008

10:42:14"how much does this cost?"$0.008

With Pearlite

10:42:01"What are your pricing plans?"$0.008

10:42:03"how much does it cost?"HIT$0.000

10:42:07"pricing?"HIT$0.000

10:42:09"What are your pricing plans?"HIT$0.000

10:42:12"do you have a free tier?"HIT$0.000

10:42:14"how much does this cost?"HIT$0.000

How It Works

The full infrastructure, explained.

Pearlite is a transparent proxy that intercepts LLM calls, computes semantic similarity, and returns cached responses in milliseconds — without ever touching your application logic.

Request Flow Architecture

Your App

API call

Pearlite
Proxy

embed query

Vector Store
(per workspace)

sim ≥ threshold

Cache Hit
<12ms

sim < threshold

LLM Provider
600–1200ms

STEP 01

Query Interception

Your request hits the Pearlite proxy endpoint instead of the provider directly. We parse the message payload and extract the semantic content to embed. Zero changes to your existing code structure.

STEP 02

Vector Embedding

The query is converted to a high-dimensional embedding using a fast embedding model (<3ms). This vector captures semantic meaning — not just keywords — enabling fuzzy matching across rephrasings, typos, and synonyms.

STEP 03

Similarity Search

We perform ANN (approximate nearest-neighbor) search across your workspace's vector store. If cosine similarity ≥ your configured threshold, the cached response is returned immediately without touching the LLM API.

STEP 04

Miss Handling + Store

On a cache miss, the request is forwarded transparently to your LLM provider. The response streams back to your app while simultaneously being stored in your isolated vector store for future hits. TTL and eviction policies are fully configurable.

Analytics

Everything visible. Always.

A real-time dashboard that shows where every dollar goes and how much Pearlite is saving you — per day, per model, per use case.

Total Calls Today

24,847

+12% vs yesterday

Cache Hit Rate

41.2%

10,234 hits today

Saved This Month

$960

vs $2,400 without Pearlite

Daily API spend — before vs after Pearlite

Last 14 days

Cache hit rate by category

Avg across beta users

Response latency — hit vs miss

Distribution in ms

The Math

See what you’d save.

Move the slider. Numbers update in real time.

Monthly LLM API calls: 100,000

Current Spend

$800

at $0.008 per call

With Pearlite

$480

40% avg cache hit rate

You Save

$320

every month

Based on observed 40% avg cache hit rate. Customer support bots typically see 55–70%.

Use Cases

Built for how AI is actually used.

Anywhere users ask questions, Pearlite finds patterns and saves you money.

Customer Support Bots

Users ask the same 50 questions in 500 different ways. Pearlite catches all of them — pricing, refunds, account issues, all cached.

avg 62% cache rate

RAG Applications

Document Q&A systems get hammered with repeated queries. Cache the answers, not just the chunks. Cut retrieval costs too.

avg 44% cache rate

Internal Copilots

Your team asks the same codebase questions every day. Stop paying for the same answer twice. Pearlite gets smarter with every query.

avg 38% cache rate

Public AI Features

Search bars, content generators, recommendation engines — semantic deduplication at scale. Ship cheaper, faster.

avg 41% cache rate

Vertical AI Apps

Legal, medical, finance — domain-specific apps where the same queries repeat constantly. High stakes, high volume, high savings.

avg 51% cache rate

Analytics Queries

Natural language to SQL loops hit identical queries constantly. Pearlite eliminates redundancy at the source before the LLM even sees it.

avg 48% cache rate

Under the Hood

Semantic, not string matching.

Exact-match caching is table stakes. Pearlite understands meaning — so "pricing?" and "how much does this cost?" are the same query.

Similarity Engine

Vector-based understanding

Every query converts to an embedding. Compared against your cache via cosine similarity. You control the threshold from conservative to aggressive.

"What's the price?"0.97HIT

"How much does it cost?"0.94HIT

"pricing?"0.91HIT

Data Privacy

Your data stays yours

Embeddings and responses stored in an isolated vector store per workspace. Never trains our models. Never shared. Fully deletable on request.

✓ Isolated per workspace

✓ Zero model training on your data

✓ Full deletion on request

◆ SOC 2 in progress

Latency

Your users will notice

Cache hits return under 12ms. Compared to 600–1200ms for a live LLM call. Speed and savings in the same package.

LLM call

840ms

Pearlite

11ms

Provider Support

Works with every major LLM provider.

If it takes a prompt, Pearlite can cache it.

✓ OpenAI

✓ Anthropic

✓ Google Gemini

✓ Groq

✓ Mistral

✓ Any OpenAI-compatible API

Documentation

Everything you need, documented.

From quickstart to advanced configuration, the docs cover everything. Most teams are in production within an hour.

Quickstart

Get from zero to your first cache hit in under 5 minutes. Works with any existing LLM integration.

Full SDK reference for Node.js, Python, and the REST API. Response types, error codes, headers.

→ REFERENCE

Dashboard

Explore cache analytics, hit rate by model, cost savings over time, and workspace management.

→ DASHBOARD

Security & Privacy

Workspace isolation, data handling, retention policies, deletion APIs, and our SOC 2 roadmap.

→ SECURITY

Migration Guide

Already using Redis or exact-match caching? Migrate to semantic caching without losing your existing cache.

→ MIGRATE

Open Source

Built in public. Contributions welcome.

The Pearlite SDK is open source. The core proxy logic, embedding pipeline, and SDK clients are all on GitHub. Contributions welcome.

View on GitHub → npm install pearlite

Pricing

Simple, usage-based pricing.

Pay only for what you use. No seats, no contracts, no hidden fees. Scale up or down any time.

Hobby

$0

For side projects and exploration. No credit card required.

✓ 10,000 API calls / month
✓ 1 workspace
✓ All LLM providers
✓ Basic analytics dashboard
✓ Community support

Get Started Free

Pro — most popular

$49/mo

For teams shipping AI features in production. Includes everything in Hobby.

✓ 1M API calls / month
✓ 5 workspaces
✓ Advanced analytics + exports
✓ Per-model cache policies
✓ Custom TTL and eviction rules
✓ Email + Slack support

Get Early Access →

Enterprise

Custom

For high-volume teams with compliance and SLA requirements.

✓ Unlimited API calls
✓ Unlimited workspaces
✓ SOC 2 (in progress)
✓ VPC deployment option
✓ SLA + dedicated support
✓ Custom data retention

Talk to Us

Why This Exists

Developers are screaming about this.

Across Reddit, Hacker News, and Twitter — the same frustration keeps surfacing. We built Pearlite because we felt it too.

r/MachineLearning

↑ 1.4k

"Our LLM API bill went from $400 to $3,200 in one month after we launched our chatbot. Looking at the logs, at least 60% of questions are variations of the same 20 things. There has to be a better way."

u/devops_tired · 847 comments

Hacker News

↑ 312

"We're paying $0.008 per call and our support bot answers 'what are your pricing plans?' about 4,000 times a day. That's $32/day for the exact same answer. Caching by string match doesn't work because everyone phrases it differently."

thrwaway_swe · 6 hours ago

Twitter / X

↗ 2.1k

"my llm api costs are insane. same question asked 50 different ways = 50 full price calls. 'what's the refund policy' 'can i get a refund' 'do you have refunds' — why is there no semantic caching layer for this lol"

@indie_hacker_io

r/SideProject

↑ 891

"Just got my first OpenAI bill for a full month — $1,800. I nearly choked. I built a customer support bot and everyone keeps asking the same FAQ questions. I need to cache LLM responses but exact match caching barely helps because natural language is messy."

u/bootstrapped_dev · 234 comments

Hacker News

↑ 178

"Every time I look at our LLM spend analytics, it's depressing. Tons of near-duplicate queries hitting the API fresh every single time. We tried building our own semantic cache but maintaining embeddings infrastructure is a whole project in itself."

silentcoder_sf · 3 hours ago

Twitter / X

↗ 987

"spending $2k/month on gpt-4 for a FAQ bot. the answer to 'how do i cancel' is identical whether someone types 'how do i cancel', 'cancellation process', or 'i want to cancel my subscription'. paying for all three separately is painful"

@yc_founder_23

r/startups

↑ 654

"LLM API costs are the #1 thing killing our margins. We're a small team, our product is AI-first, and every user query costs money. I've tried prompt caching, exact-match Redis caching... but semantic deduplication? Nobody has a clean solution."

u/ramen_profitable · 119 comments

Hacker News

↑ 445

"Ask HN: How do you handle LLM cost optimization at scale? We're at $15k/month and climbing. Semantic caching seems promising but building it in-house feels like a distraction from our actual product."

building_in_public · 78 comments

r/MachineLearning

↑ 1.4k

"Our LLM API bill went from $400 to $3,200 in one month after we launched our chatbot. Looking at the logs, at least 60% of questions are variations of the same 20 things. There has to be a better way."

u/devops_tired · 847 comments

Hacker News

↑ 312

"We're paying $0.008 per call and our support bot answers 'what are your pricing plans?' about 4,000 times a day. That's $32/day for the exact same answer. Caching by string match doesn't work because everyone phrases it differently."

thrwaway_swe · 6 hours ago

Twitter / X

↗ 2.1k

"my llm api costs are insane. same question asked 50 different ways = 50 full price calls. 'what's the refund policy' 'can i get a refund' 'do you have refunds' — why is there no semantic caching layer for this lol"

@indie_hacker_io

r/SideProject

↑ 891

"Just got my first OpenAI bill for a full month — $1,800. I nearly choked. I built a customer support bot and everyone keeps asking the same FAQ questions. I need to cache LLM responses but exact match caching barely helps because natural language is messy."

u/bootstrapped_dev · 234 comments

Hacker News

↑ 178

"Every time I look at our LLM spend analytics, it's depressing. Tons of near-duplicate queries hitting the API fresh every single time. We tried building our own semantic cache but maintaining embeddings infrastructure is a whole project in itself."

silentcoder_sf · 3 hours ago

Stop paying for
the same LLM call twice.

Drop in. No rewrites.

Your AI bill is full of questions
you’ve already answered.

The full infrastructure, explained.

Everything visible. Always.

See what you’d save.

Built for how AI is actually used.

Semantic, not string matching.

Works with every major LLM provider.

Everything you need, documented.

Built in public. Contributions welcome.

Simple, usage-based pricing.

Developers are screaming about this.

LLM caching.
Available today.

Stop paying forthe same LLM call twice.

Drop in. No rewrites.

Your AI bill is full of questionsyou’ve already answered.

The full infrastructure, explained.

Everything visible. Always.

See what you’d save.

Built for how AI is actually used.

Semantic, not string matching.

Works with every major LLM provider.

Everything you need, documented.

Built in public. Contributions welcome.

Simple, usage-based pricing.

Developers are screaming about this.

LLM caching.Available today.

Stop paying for
the same LLM call twice.

Your AI bill is full of questions
you’ve already answered.

LLM caching.
Available today.