Agentic chat
Agentic chat is an AI workspace that combines document search, memory, tools, and research flows so it can answer with better context and lower repeat cost.
Overview
Agentic chat routes requests through different execution paths based on intent: simple queries use cached responses, document questions trigger RAG, complex tasks use LangGraph research flows, and tool calls go through Google Workspace integration.
Agentic chat routes requests through different execution paths based on intent: simple queries use cached responses, document questions trigger RAG, complex tasks use LangGraph research flows, and tool calls go through Google Workspace integration.
The stack includes OpenAI for generation, Mem0 for long-term memory, PostgreSQL for vector storage, LangGraph for multi-step planning, and a custom router that selects the optimal path per request.
How It Was Built
The main technical choices behind the product, from system design to the parts that make it work day to day.
- Implemented intent classification to route requests: cached responses for repeated queries, RAG for document-specific questions, LangGraph agents for multi-step research, and direct tool calls for Google Workspace actions.
- Built document ingestion pipeline with PDF/Word parsing, semantic chunking (512 tokens with overlap), OpenAI embeddings, PostgreSQL pgvector storage, and cross-encoder reranking for relevance.
- Implemented intent classification to route requests: cached responses for repeated queries, RAG for document-specific questions, LangGraph agents for multi-step research, and direct tool calls for Google Workspace actions.
- Built document ingestion pipeline with PDF/Word parsing, semantic chunking (512 tokens with overlap), OpenAI embeddings, PostgreSQL pgvector storage, and cross-encoder reranking for relevance.
- Created LangGraph research agents with explicit planning phases, parallel task execution, result synthesis, and self-correction loops for complex queries.
- Added conversation branching, export to markdown, and Google Workspace integration (Gmail, Calendar, Docs) so users can take action on AI-generated insights.
Impact
- Semantic caching reduced API costs by 40% because similar requests stopped repeating the same expensive work.
- Answers improved because the product chooses the right context for each request instead of pushing everything through one path.
- Semantic caching reduced API costs by 40% because similar requests stopped repeating the same expensive work.
- Answers improved because the product chooses the right context for each request instead of pushing everything through one path.
- Memory, branching, and bring-your-own-key support made the product easier to use for ongoing work instead of one-off demos.
Highlights
- Semantic caching reduced API costs by 40% by avoiding redundant embeddings and model calls.
- Routing layer achieves sub-100ms intent classification with 92% accuracy on path selection.
- Semantic caching reduced API costs by 40% by avoiding redundant embeddings and model calls.
- Routing layer achieves sub-100ms intent classification with 92% accuracy on path selection.
- LangGraph research agents handle 8-step planning with parallel execution in under 30 seconds.
Tech Stack
More Projects
Additional work across AI products, developer tooling, and full-stack systems.
Next.js 16
Edward
An AI coding workspace where developers can describe apps in plain language, generate production-ready code, inspect and edit files in real-time, run projects in isolated Docker environments, publish live previews, and sync everything directly to GitHub without leaving the product.
Next.js
Bonkers by Foyer
Creative production system for making and reusing high-quality visual assets.
AWS (ECS, ECR, S3)
DeployNinja
GitHub-native deployment platform for automated builds, live logs, and repeatable releases.