AnalyzeEQ
Upload a 3,000-page PDF or a Google Sheet and ask it questions in plain English — hybrid vector + keyword retrieval over pgvector and FAISS, billed on Stripe.
Two parallel retrievals, one ranked answer
Vector similarity finds *meaning*; keyword search finds *words*. Reciprocal Rank Fusion merges the two ranked lists into one — the first version was vector-only and missed every literal-term query users actually asked.
Five stages from upload to queryable index
Validate
Five sequential layers. Magic-number / file-signature check, structural integrity, byte-size cap, page-count cap, content sanity. The file never reaches the extractor unless all five pass.
- upload bytes
- file metadata
- validated file
- actionable rejection if not
- A document SaaS has to assume hostile uploads. Five layers means a small slice of legitimate-but-weird files get rejected — in exchange, a sharply reduced attack surface.
Summary
Started as a feature inside LeadLyft and outgrew it. AnalyzeEQ ingests PDFs, scanned forms, Excel workbooks, Word documents, CSVs and Google files; runs a 5-stage pipeline (validate → extract → chunk → embed → store); answers questions through a retrieval engine that fuses vector similarity and keyword search via Reciprocal Rank Fusion. 50+ FastAPI endpoints, a 5-model Claude fallback chain for availability, HMAC-verified idempotent Stripe webhooks, and a circuit-breakered GoHighLevel CRM sync. In production with paying customers.
Highlights
- Hybrid retrieval (vector + keyword + RRF) — better than vector-only on precise-term queries
- Hierarchical summarisation makes 3,000-page documents usable
- 5-model Claude fallback chain so user-visible 'AI is down' errors disappear