explaingit

surajpatel04/ai-multimedia-rag-app

0PythonAudience · developerComplexity · 4/5ActiveSetup · hard

TLDR

A full-stack RAG app where you upload PDFs, audio, or video and chat about the content, with clickable timestamps that jump an embedded player to the cited moment.

Mindmap

mindmap
  root((InsightFlow))
    Inputs
      PDFs
      Audio files
      Video files
      User questions
    Outputs
      Streamed answers
      Clickable timestamps
      File summaries
    Use Cases
      Search inside lectures
      Chat over podcasts
      Query meeting recordings
      Build a media notebook
    Tech Stack
      FastAPI
      LangGraph
      FAISS
      MongoDB
      React
      Redis

Things people build with this

USE CASE 1

Self-host a chat-with-your-media app for PDFs, audio, and video

USE CASE 2

Build a study tool that jumps a player to cited timestamps

USE CASE 3

Prototype a two-phase upload flow that only embeds confirmed files

USE CASE 4

Test semantic Redis caching against a real RAG workload

Tech stack

PythonFastAPILangGraphFAISSMongoDBRedisReactTypeScript

Getting it running

Difficulty · hard Time to first run · 1h+

Requires MongoDB, Redis, Supabase Storage, plus paid OpenAI/Gemini and Deepgram keys before any upload works end-to-end.

In plain English

InsightFlow is a web application that lets you upload PDFs, audio recordings, and MP4 videos, then ask questions about their contents in a chat window. The answers come from an AI model but are grounded in the files you uploaded, a technique known as Retrieval-Augmented Generation, or RAG. When the answer refers to a moment in an audio or video file, the chat shows a clickable timestamp like [0:05 to 1:52] that jumps the in-page media player to that exact second. The backend is built in Python with FastAPI and uses LangGraph to manage the back-and-forth of a conversation. File uploads go through a two-step process: first the file is stored temporarily in Supabase Storage and its text is extracted (for PDFs) or transcribed by the Deepgram API (for audio and video). Only after the user confirms the upload does the system generate embeddings, using either OpenAI or Google Gemini, and index them in a FAISS vector store. This avoids paying for embeddings on files that the user cancels. MongoDB, accessed through the Beanie library, stores session records and chat history. Querying works by turning the user's question into an embedding, finding the most relevant chunks in FAISS, and feeding them to the chosen LLM along with the conversation history. For long files, the system creates a structured summary in two passes: one summary per chunk, then a final pass that combines them. A Redis cache compares new questions to previous ones by vector similarity, and if a match scores above ninety five percent, the cached answer is streamed back instantly. Responses are pushed to the browser as they are generated through Server-Sent Events. The frontend is written in React 19 with TypeScript, Vite, and Tailwind CSS, plus animation and component libraries like Framer Motion and Shadcn UI. It has a collapsible sidebar for managing chat sessions, a media player based on plyr-react, and a background token refresh hook for keeping the user signed in. Authentication uses JWT access and refresh tokens, and login endpoints are rate limited. The README documents two run paths, local development and Docker Compose.

Copy-paste prompts

Prompt 1
Walk me through running InsightFlow with Docker Compose and the minimum set of env keys I actually need to make uploads work
Prompt 2
Show me how the LangGraph node in InsightFlow injects retrieved chunks and chat history into the OpenAI call
Prompt 3
Help me replace Deepgram in InsightFlow with a local Whisper transcription step while keeping the timestamp metadata
Prompt 4
Explain how the Redis semantic cache in InsightFlow decides a 95 percent similarity hit and how I would lower that threshold
Prompt 5
Add a new file type, say DOCX, to InsightFlow ingestion and wire it through the temp_id flow
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.