State-of-the-Art Embedding Retrieval: Gemini + pgvector for Production Chat Systems
How we achieved 2x faster vector search with identical recall using Gemini embeddings, task-optimized retrieval, and pgvector's half-precision quantization.
Blogging to document my path & share what I learn along the way 😊