Self-host a chat-with-your-media app for PDFs, audio, and video
Build a study tool that jumps a player to cited timestamps
Prototype a two-phase upload flow that only embeds confirmed files
Test semantic Redis caching against a real RAG workload
Requires MongoDB, Redis, Supabase Storage, plus paid OpenAI/Gemini and Deepgram keys before any upload works end-to-end.
InsightFlow is a web application that lets you upload PDFs, audio recordings, and MP4 videos, then ask questions about their contents in a chat window. The answers come from an AI model but are grounded in the files you uploaded, a technique known as Retrieval-Augmented Generation, or RAG. When the answer refers to a moment in an audio or video file, the chat shows a clickable timestamp like [0:05 to 1:52] that jumps the in-page media player to that exact second. The backend is built in Python with FastAPI and uses LangGraph to manage the back-and-forth of a conversation. File uploads go through a two-step process: first the file is stored temporarily in Supabase Storage and its text is extracted (for PDFs) or transcribed by the Deepgram API (for audio and video). Only after the user confirms the upload does the system generate embeddings, using either OpenAI or Google Gemini, and index them in a FAISS vector store. This avoids paying for embeddings on files that the user cancels. MongoDB, accessed through the Beanie library, stores session records and chat history. Querying works by turning the user's question into an embedding, finding the most relevant chunks in FAISS, and feeding them to the chosen LLM along with the conversation history. For long files, the system creates a structured summary in two passes: one summary per chunk, then a final pass that combines them. A Redis cache compares new questions to previous ones by vector similarity, and if a match scores above ninety five percent, the cached answer is streamed back instantly. Responses are pushed to the browser as they are generated through Server-Sent Events. The frontend is written in React 19 with TypeScript, Vite, and Tailwind CSS, plus animation and component libraries like Framer Motion and Shadcn UI. It has a collapsible sidebar for managing chat sessions, a media player based on plyr-react, and a background token refresh hook for keeping the user signed in. Authentication uses JWT access and refresh tokens, and login endpoints are rate limited. The README documents two run paths, local development and Docker Compose.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.