September 12, 2025

RAG-Chat

RAG-Chat is a document chat application that processes files in the background, stores searchable vectors, and returns answers with source citations.

GitHub

Overview

Document-grounded chat system with background file processing. Uploads trigger async ingestion pipeline: parsing, chunking, embedding, and vector storage. Chat queries retrieve relevant chunks with citations.

Next.js frontend, Express API, BullMQ workers for async processing. OpenAI for embeddings and generation. Qdrant for vector storage with HNSW indexing. Redis for job queue.

How It Was Built

The main technical choices behind the product, from system design to the parts that make it work day to day.

Async ingestion pipeline: BullMQ workers process PDF, DOCX, and XLSX files. PDF.js for PDF extraction, mammoth.js for Word, exceljs for spreadsheets.
Chunking strategy: 512-token chunks with 50-token overlap, recursive character splitting preserving paragraph boundaries.

Async ingestion pipeline: BullMQ workers process PDF, DOCX, and XLSX files. PDF.js for PDF extraction, mammoth.js for Word, exceljs for spreadsheets.
Chunking strategy: 512-token chunks with 50-token overlap, recursive character splitting preserving paragraph boundaries.
Vector search: OpenAI text-embedding-3-small (1536 dims), stored in Qdrant with HNSW index, top-k=5 retrieval with score threshold 0.7.
RAG pipeline: query → embedding → vector search → context assembly → GPT-4 with source citations. Processing status shown via WebSocket.

Impact

Async processing keeps chat responsive under heavy file uploads (tested with 100MB PDFs) by offloading to BullMQ workers.
Source citations with direct document links reduce hallucination rate by enabling fact-checking against original text.

Async processing keeps chat responsive under heavy file uploads (tested with 100MB PDFs) by offloading to BullMQ workers.
Source citations with direct document links reduce hallucination rate by enabling fact-checking against original text.
Modular architecture allows adding new file formats (images, audio) by extending the worker pipeline without touching chat logic.

Tech Stack

Next.jsTypeScriptOpenAILangChainQdrantDockerBullMQRedis

More Projects

Additional work across AI products, developer tooling, and full-stack systems.

Browse all →

Next.js 16

Edward

An AI coding workspace where developers can describe apps in plain language, generate production-ready code, inspect and edit files in real-time, run projects in isolated Docker environments, publish live previews, and sync everything directly to GitHub without leaving the product.

View project

Next.js 16

Agentic chat

An AI chat platform that routes each request through the right context — memory, documents, tools, or research — and acts on the answer through connected apps.

View project

Next.js

Bonkers by Foyer

Creative production system built at Foyer Tech — led the v2→v3 rebuild with reusable templates, multi-model routing, and faster repeat workflows for high-quality visual asset creation.

View project

Overview

Next.js frontend, Express API, BullMQ workers for async processing. OpenAI for embeddings and generation. Qdrant for vector storage with HNSW indexing. Redis for job queue.

How It Was Built

The main technical choices behind the product, from system design to the parts that make it work day to day.

Async ingestion pipeline: BullMQ workers process PDF, DOCX, and XLSX files. PDF.js for PDF extraction, mammoth.js for Word, exceljs for spreadsheets.
Chunking strategy: 512-token chunks with 50-token overlap, recursive character splitting preserving paragraph boundaries.

Async ingestion pipeline: BullMQ workers process PDF, DOCX, and XLSX files. PDF.js for PDF extraction, mammoth.js for Word, exceljs for spreadsheets.
Chunking strategy: 512-token chunks with 50-token overlap, recursive character splitting preserving paragraph boundaries.
Vector search: OpenAI text-embedding-3-small (1536 dims), stored in Qdrant with HNSW index, top-k=5 retrieval with score threshold 0.7.
RAG pipeline: query → embedding → vector search → context assembly → GPT-4 with source citations. Processing status shown via WebSocket.

Impact

Async processing keeps chat responsive under heavy file uploads (tested with 100MB PDFs) by offloading to BullMQ workers.
Source citations with direct document links reduce hallucination rate by enabling fact-checking against original text.

Async processing keeps chat responsive under heavy file uploads (tested with 100MB PDFs) by offloading to BullMQ workers.
Source citations with direct document links reduce hallucination rate by enabling fact-checking against original text.
Modular architecture allows adding new file formats (images, audio) by extending the worker pipeline without touching chat logic.