2025 – PresentEngineer → Tech LeadClosed-source

AnalyzeEQ

Upload a 3,000-page PDF or a Google Sheet and ask it questions in plain English — hybrid vector + keyword retrieval over pgvector and FAISS, billed on Stripe.

Hybrid retrieval · vector + keyword + RRF

Two parallel retrievals, one ranked answer

Vector similarity finds *meaning*; keyword search finds *words*. Reciprocal Rank Fusion merges the two ranked lists into one — the first version was vector-only and missed every literal-term query users actually asked.

3,000 pp

Max document

File formats

50+

API endpoints

›What did clause 7.2 say about renewal terms?query 1/3· auto-cycling

Vector

semantic similarity

Keyword

exact match · BM25

RRF Fusion

Σ 1/(k + rank)

Hover any chunk to see its rank in all three lanesPipeline below ↓

Ingestion pipeline

Five stages from upload to queryable index

Stage 1 of 5validate

Validate

magic-numberstructure≤ 9 MB≤ 3,000 ppcontent sanity

Five sequential layers. Magic-number / file-signature check, structural integrity, byte-size cap, page-count cap, content sanity. The file never reaches the extractor unless all five pass.

Input

upload bytes
file metadata

Output

validated file
actionable rejection if not

Why this stage

A document SaaS has to assume hostile uploads. Five layers means a small slice of legitimate-but-weird files get rejected — in exchange, a sharply reduced attack surface.

Stage 1 / 5— auto-advancing · click any stage to pin

Claude fallback chain

Sonnet 4.5Sonnet 4Sonnet 3.7Haiku 4.5Haiku 3.5

User-visible “AI is down” errors disappear during a model incident

FastAPIReact 18PostgreSQL + pgvectorFAISSClaude Sonnet 4.5OpenAI EmbeddingsStripeGoHighLevel

50+

API endpoints

File formats

3,000 pp

Max document

2–3

Team size

Summary

Started as a feature inside LeadLyft and outgrew it. AnalyzeEQ ingests PDFs, scanned forms, Excel workbooks, Word documents, CSVs and Google files; runs a 5-stage pipeline (validate → extract → chunk → embed → store); answers questions through a retrieval engine that fuses vector similarity and keyword search via Reciprocal Rank Fusion. 50+ FastAPI endpoints, a 5-model Claude fallback chain for availability, HMAC-verified idempotent Stripe webhooks, and a circuit-breakered GoHighLevel CRM sync. In production with paying customers.

Highlights

Hybrid retrieval (vector + keyword + RRF) — better than vector-only on precise-term queries
Hierarchical summarisation makes 3,000-page documents usable
5-model Claude fallback chain so user-visible 'AI is down' errors disappear

This project is closed-source (built for a Kcube AI client). I'm happy to walk through the architecture, trade-offs, and code on request.