explaingit

clarkluoluo/clark-utov

14PythonAudience · researcherComplexity · 5/5Setup · hard

TLDR

A Python tool for reverse-engineering hidden algorithms inside Android apps by analyzing instruction traces from ARM64 native code protected with VMP and OLLVM obfuscation, using a multi-stage pipeline designed to work with AI agents that have limited memory.

Mindmap

mindmap
  root((clark-utov))
    What it does
      Reverse-engineer algorithms
      Decode obfuscated code
      Identify crypto functions
    How it works
      Instruction trace input
      5-stage analysis pipeline
      Hypothesis ledger
    Tech stack
      Python
      Triton symbolic execution
      LLM agent support
    Target code
      ARM64 native libs
      VMP obfuscation
      OLLVM obfuscation
    Use cases
      Android app analysis
      Security research
      AI-driven code analysis
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Discover what cryptographic algorithm such as SM3 a protected Android app function is secretly using.

USE CASE 2

Analyze obfuscated ARM64 native libraries from Android apps to recover their original logic.

USE CASE 3

Drive long-running reverse-engineering analysis sessions using an AI agent with limited context memory.

USE CASE 4

Audit the reasoning trail for algorithm identification via the hypothesis ledger and pipeline state.

Tech stack

PythonTritonARM64LLM APISymbolic Execution

Getting it running

Difficulty · hard Time to first run · 1day+

Requires Python and the Triton symbolic execution library. Documentation is primarily in Chinese. Optional LLM API integration needs separate configuration. Designed for security researchers with reverse-engineering experience.

License not mentioned in the explanation.

In plain English

clark-utov is a tool for reverse-engineering algorithms that have been deliberately hidden inside Android apps. Specifically, it targets native code libraries compiled for ARM64 processors and protected with obfuscation systems called VMP and OLLVM, which scramble a program's instructions to make them very hard to read. The goal is to figure out what a protected function actually does, for example to discover that a particular routine implements a specific cryptographic algorithm like SM3. The tool works by consuming an instruction trace, which is a recording of every low-level operation the target function performed during execution. That trace is fed through a multi-stage analysis pipeline labeled S1 through S5. Each stage narrows down the possibilities, and the results are stored in what the project calls a hypothesis ledger, an auditable log of conclusions about what the algorithm is and how it works, along with the evidence supporting each conclusion. A significant part of the design is aimed at making the tool work well when driven by an AI language model agent, particularly agents that can only hold a limited amount of information in memory at once. Instead of requiring the agent to remember dozens of steps, clark-utov externalizes all the tracking into the ledger and structured pipeline state, so the agent only needs to make one bounded decision at a time based on what the tool surfaces to it. The README describes this as a way to let narrow-context agents do long-running analytical work without losing track of where they are. The pipeline also includes a symbolic execution component using a library called Triton, a blue-team review step, parity checks to catch incorrect conclusions, and an optional mode that calls a language model API for generating and testing hypotheses. The project is written primarily in Chinese documentation but the codebase is in Python.

Copy-paste prompts

Prompt 1
I have an instruction trace from an ARM64 Android native library protected with OLLVM. Using clark-utov, walk me through running the S1-S5 pipeline stages on this trace to identify the hidden algorithm.
Prompt 2
Explain how to use clark-utov's hypothesis ledger to track conclusions about an obfuscated function. What gets recorded at each pipeline stage and how do I read the output?
Prompt 3
I want to use an AI agent with limited context to analyze a VMP-protected Android function with clark-utov. How do I set up the pipeline so the agent only needs to make one bounded decision at a time?
Prompt 4
How do I use Triton symbolic execution inside clark-utov to verify whether a detected algorithm is actually SM3 or another cryptographic function?
Prompt 5
Set up clark-utov to call an LLM API for hypothesis generation on an obfuscated Android native library. What configuration and input format does it expect?
Open on GitHub → Explain another repo

← clarkluoluo on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.