explaingit

haimantika/talkie

1TypeScriptAudience · developerComplexity · 4/5ActiveSetup · hard

TLDR

Android floating-bubble app that listens to a spoken request in Bengali, reads the screen with GPT-4.1 Vision, and performs taps and form fills via the accessibility service.

Mindmap

mindmap
  root((talkie))
    Inputs
      Bengali speech
      Screen contents
      Sarvam API key
    Outputs
      Taps and gestures
      Bengali voice reply
      Spoken guide steps
    Use Cases
      Voice-driven phone use
      Phone literacy training
      Local-language assistant
    Tech Stack
      TypeScript
      React Native
      Expo
      Sarvam AI
      OpenAI Vision
      Android Accessibility

Things people build with this

USE CASE 1

Let a Bengali-speaking user run any Android app by voice, hands-free.

USE CASE 2

Build a phone-literacy tutor that tells the user where to tap instead of doing it for them.

USE CASE 3

Prototype a vision-driven phone agent for another low-resource language by swapping the Sarvam models.

USE CASE 4

Demo a screen-aware AI agent that uses accessibility services for real-device automation.

Tech stack

TypeScriptReact NativeExpoSarvamOpenAIAndroid

Getting it running

Difficulty · hard Time to first run · 1h+

Floating bubble needs a native Android build, plus Sarvam and optional OpenAI API keys and three runtime Android permissions before anything works.

In plain English

Talkie is an Android app that puts a floating bubble on top of whatever else you have open on your phone. You press and hold the bubble, speak a request in your own local language, and Talkie figures out what you want to do and does it for you. According to the README, that includes things like tapping buttons, filling out forms, scrolling, and opening apps. The example language used throughout the README is Bengali. To use it you need an Android phone running Android 10 or newer and an API key from Sarvam, which is a service that handles Bengali speech-to-text and text-to-speech. The Sarvam key is required, with a free tier mentioned. An OpenAI key is optional. With the OpenAI key Talkie can look at your screen and take more precise actions; without it, the README says Talkie still works for apps it already knows about by using their deep links. On first launch you have to give Talkie three permissions in Android settings: drawing over other apps, the accessibility service, and the microphone. Then you enter your API keys in the Talkie settings screen. After that, the floating bubble is visible everywhere, and the interaction loop is to hold the bubble, speak, and let Talkie work. There is a setting called Guide mode. With Guide mode off, Talkie does the task itself. With Guide mode on, Talkie does not act for you; instead it tells you in Bengali what to tap or do, which the README pitches as a way for someone to learn how to use a phone or an app. The README also includes a short diagram of what happens under the hood. Your speech goes to a Sarvam transcription model called Saaras, then GPT-4.1 Vision reads a screenshot of your screen and decides on an action such as a tap at a given coordinate, then the Android accessibility service performs that tap, and finally a Sarvam voice model called Bulbul speaks the result back in Bengali. The project is built with TypeScript and there is a quick test path using Expo Go, though the floating bubble itself needs a native build.

Copy-paste prompts

Prompt 1
Set up Talkie on my Android 13 phone, including Sarvam key, OpenAI key, and the three permissions it needs.
Prompt 2
Modify Talkie to support Hindi instead of Bengali by swapping the Sarvam Saaras and Bulbul voice configs.
Prompt 3
Replace GPT-4.1 Vision in Talkie with Claude Sonnet vision and update the action-decision prompt accordingly.
Prompt 4
Add a Guide mode toggle to the bubble UI in Talkie and persist the choice across launches.
Prompt 5
Trace the action loop in Talkie from microphone capture to accessibility-service tap and tell me where I would add logging.
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.