Everyone reaches for Pinecone or Weaviate when they need a vector database. For most production workloads at under a million documents, you already have everything you need: Postgres + pgvector.
Here's the full RAG setup in Next.js with zero new infrastructure.
What pgvector Gives You
pgvector adds vector storage and similarity search to Postgres. After a one-line extension install, you can:
- Store embeddings as
vectorcolumns - Run approximate nearest-neighbor search with
<->(L2),<#>(inner product), or<=>(cosine) operators - Use standard Postgres indexes (IVFFlat, HNSW) for performance at scale
No new service, no new bill, no data leaving your existing infrastructure.
Setup
Enable pgvector on Supabase (or any Postgres):
create extension if not exists vector;
Create the documents table:
create table documents (
id uuid primary key default gen_random_uuid(),
content text not null,
embedding vector(1536),
metadata jsonb default '{}',
created_at timestamptz default now()
);
-- HNSW index for fast approximate search
create index on documents using hnsw (embedding vector_cosine_ops)
with (m = 16, ef_construction = 64);
Ingestion Pipeline
// lib/rag/ingest.ts
import Anthropic from '@anthropic-ai/sdk'
import { db } from '@/db'
import { documents } from '@/db/schema'
async function embed(text: string): Promise<number[]> {
const response = await fetch('https://api.voyageai.com/v1/embeddings', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.VOYAGE_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ input: text, model: 'voyage-3' }),
})
const data = await response.json()
return data.data[0].embedding
}
export async function ingestDocument(content: string, metadata = {}) {
const embedding = await embed(content)
await db.insert(documents).values({ content, embedding, metadata })
}
export async function ingestBatch(docs: { content: string; metadata?: object }[]) {
for (const doc of docs) {
await ingestDocument(doc.content, doc.metadata)
await new Promise(r => setTimeout(r, 100))
}
}
Retrieval
// lib/rag/retrieve.ts
import { sql } from 'drizzle-orm'
import { db } from '@/db'
export async function retrieveContext(
query: string,
limit = 5,
threshold = 0.7
): Promise<string[]> {
const queryEmbedding = await embed(query)
const embeddingStr = `[${queryEmbedding.join(',')}]`
const results = await db.execute(sql`
select content, 1 - (embedding <=> ${embeddingStr}::vector) as similarity
from documents
where 1 - (embedding <=> ${embeddingStr}::vector) > ${threshold}
order by embedding <=> ${embeddingStr}::vector
limit ${limit}
`)
return results.rows.map((r: any) => r.content)
}
The RAG API Route
// app/api/chat/route.ts
import Anthropic from '@anthropic-ai/sdk'
import { retrieveContext } from '@/lib/rag/retrieve'
const client = new Anthropic()
export async function POST(req: Request) {
const { messages } = await req.json()
const lastMessage = messages[messages.length - 1]
const context = await retrieveContext(lastMessage.content)
const systemPrompt = context.length > 0
? `You are a helpful assistant. Use the following context to answer questions accurately.\n\n<context>\n${context.join('\n\n---\n\n')}\n</context>\n\nIf the context doesn't contain the answer, say so clearly.`
: 'You are a helpful assistant.'
const stream = client.messages.stream({
model: 'claude-sonnet-4-6',
max_tokens: 1024,
system: systemPrompt,
messages,
})
return new Response(
new ReadableStream({
async start(controller) {
for await (const chunk of stream) {
if (chunk.type === 'content_block_delta' && chunk.delta.type === 'text_delta') {
controller.enqueue(new TextEncoder().encode(chunk.delta.text))
}
}
controller.close()
},
}),
{ headers: { 'Content-Type': 'text/plain; charset=utf-8' } }
)
}
Performance at Scale
pgvector's HNSW index maintains sub-10ms retrieval up to ~1M vectors on standard Postgres hardware.
| Metric | pgvector (HNSW) | Pinecone |
|---|---|---|
| Query latency (10K docs) | 2-5ms | 20-50ms |
| Query latency (1M docs) | 5-15ms | 15-40ms |
| Setup complexity | Low (extension) | Medium (new service) |
| Cost (10K docs/day) | ~$0 (existing DB) | ~$70/mo |
| Exact search available | Yes | No (ANN only) |
| SQL joins on results | Yes | No |
The SQL joins point is underrated. You can filter by metadata, join to users, restrict by permissions — all in one query. Pinecone requires a round-trip to Postgres after retrieval to do any of that.
When to Move to a Dedicated Vector DB
Switch when:
- You're above 5M vectors and seeing latency >50ms
- You need multi-modal embeddings (image + text in same index)
- Your embedding dimensions exceed 2000 (pgvector limit)
- You need real-time index updates at >1K writes/second
For 95% of SaaS products, you won't hit any of these before you're well-funded enough to care.
What I'm Building This Into
The AI SaaS Starter Kit includes a pre-built RAG module: pgvector schema, Drizzle queries, ingest pipeline, and the streaming chat route above.
If you want the full production-ready stack without starting from scratch:
- AI SaaS Starter Kit — $99 one-time — includes RAG module, Claude API integration, auth, Stripe billing, and deployment config
- Workflow Automator MCP — $15/mo — gives Claude access to your document store directly from the IDE
United States
NORTH AMERICA
Related News
UCP Variant Data: The #1 Reason Agent Checkouts Fail
7h ago
Amazon Employees Are 'Tokenmaxxing' Due To Pressure To Use AI Tools
21h ago
How Braze’s CTO is rethinking engineering for the agentic area
10h ago

Décryptage technique : Comment builder un téléchargeur de vidéos Reddit performant (DASH, HLS & WebAssembly)
17h ago
How AI Reduced Manual Driver Verification by 75% — Operations Case Study. Part 2
4h ago