Skip to content

Analyze the so file in APK through LLM+Capstone to determine the main intention of the so file and the developer (company)

License

Notifications You must be signed in to change notification settings

argus-sight/BinSight

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

BinSight

BinSight is an advanced APK analysis tool that dissects native libraries (.so files) and leverages the power of Large Language Models (LLMs)—augmented by live web search results—to determine their purpose, developer, and potential security implications. It automates the reverse engineering process of binary analysis, making it faster, more accurate, and more accessible.

Features

  • APK Extraction: Automatically extracts all .so native libraries from a given Android APK.
  • Live Web Search: Performs a web search for each library to gather real-time, public information about its developer and purpose.
  • Multi-Arch Disassembly: Uses pyelftools and capstone to disassemble code for ARM, ARM64, x86, and x86-64 architectures.
  • Rich Data Extraction: Pulls not just assembly code, but also function names and embedded strings for a more context-rich analysis.
  • Flexible LLM Integration: Powered by litellm, it supports over 100 LLMs from various providers (OpenAI, Google, Anthropic, Cohere, and any OpenAI-compatible API).
  • Configurable & Easy to Use: Simple command-line interface allows you to specify the target APK, choose your LLM, and configure custom API endpoints and keys.

Installation

  1. Clone the repository (optional): If you have the project files, you can skip this step.

    git clone <repository_url>
    cd binsight-project
  2. Install dependencies: Ensure you have Python 3.6+ installed. Then, install the required packages from requirements.txt.

    pip install -r requirements.txt

Usage

The script is run from the command line with several options to customize its behavior.

Command Syntax

python binsight.py <target_path> [options]

Arguments

  • target_path: (Required) The path to a single .apk file or a directory containing multiple .apk files.
  • --model: The LLM model to use, in litellm format (e.g., gemini/gemini-1.5-flash).
  • --api_key: Your API key for the chosen provider. If not set, the tool will look for a corresponding environment variable (e.g., GOOGLE_API_KEY, OPENAI_API_KEY).
  • --api_base: The API base URL for non-standard providers like SiliconFlow or a self-hosted model.

Examples

Example 1: Standard Analysis with Gemini

This is the simplest use case. It assumes you have your Google API key set in the environment.

  1. Set the environment variable:

    export GOOGLE_API_KEY="your_google_api_key"
  2. Run the analysis:

    python binsight.py /path/to/your_app.apk --model "gemini/gemini-1.5-flash"

Example 2: Custom OpenAI-Compatible Provider (e.g., SiliconFlow)

This example shows how to use an OpenAI-compatible endpoint, like SiliconFlow. Based on the official LiteLLM Documentation, you must prefix the model name with openai/ to route the request correctly.

python binsight.py /path/to/your_app.apk \
  --model "openai/Qwen/Qwen3-235B-A22B" \
  --api_base "https://api.siliconflow.cn/v1" \
  --api_key "your_siliconflow_api_key"

Note: The openai/ prefix is required for litellm to use its standard OpenAI-compatible client.

How It Works

  1. Extract: The input APK is unzipped, and all .so files are copied to a temporary location.
  2. Web Search: For each .so file, BinSight performs a web search to find its likely purpose and developer.
  3. Disassemble: The tool identifies the library's architecture, locates the executable .text section, and disassembles the machine code into human-readable assembly instructions.
  4. Analyze: A detailed prompt containing the web search results, filename, assembly code, function names, and strings is sent to the configured LLM via litellm.
  5. Report: The LLM's conclusion about each library's purpose and developer is collected and printed in a final summary report.
  6. Clean Up: All temporary files are deleted.

Supported Models

This tool uses litellm to interact with language models. This means you can use any of the 100+ models supported by litellm.

Core Workflow

  1. Universal LLM Interface: BinSight uses litellm as a unified gateway to over 100 LLM providers. This removes the need for provider-specific code and allows for seamless integration of new models.
  2. Dynamic Analysis Pipeline:
    • APK Deconstruction: Extracts all unique .so libraries from the target APK.
    • Metadata Extraction: Uses pyelftools and Capstone to get assembly code, function names, and embedded strings from each library.
    • Intelligent Analysis via LLM: Sends this rich metadata package to the user-selected LLM. The prompt directs the model to act as a security expert, first identifying the library by name using its internal knowledge, then corroborating that with the provided binary evidence.
  3. Unified Results: It presents a clear, concise summary of the likely purpose for each analyzed library.

Setup

  1. Clone the repository.

  2. Install Dependencies:

    pip install -r requirements.txt
  3. Set API Keys (Environment Variables): litellm automatically finds API keys set as environment variables. Set the key for the provider you intend to use.

    # For OpenAI models (gpt-4o, gpt-4-turbo, etc.)
    export OPENAI_API_KEY="YOUR_OPENAI_KEY"
    
    # For Google models (gemini/gemini-1.5-pro, etc.)
    export GEMINI_API_KEY="YOUR_GEMINI_KEY"
    
    # For SiliconFlow models
    export SILICONFLOW_API_KEY="YOUR_SILICONFLOW_KEY"

Usage

Run BinSight against a single APK file or an entire directory. The --model argument is now the central piece of the command.

Model Selection

You specify the model using the format recognized by litellm. Here are some common examples:

  • Gemini: gemini/gemini-2.5-pro
  • SiliconFlow: openai/Qwen/Qwen3-32B

Command Examples

# Analyze with OpenAI's GPT-4o (requires OPENAI_API_KEY)
python binsight.py /path/to/app.apk --model gpt-4o --api_key "YOUR_KEY"

# Analyze with Google's Gemini Pro (requires GEMINI_API_KEY)
python binsight.py /path/to/app.apk --model gemini/gemini-2.5-flash --api_key "YOUR_KEY" 

# Analyze with SiliconFlow's Qwen/Qwen3-32B, providing the key directly
python binsight.py /path/to/app.apk --model "openai/Qwen/Qwen3-32B" --api_base "https://api.siliconflow.cn/v1" --api_key "YOUR_KEY" 

All Arguments

  • input_path: Required. Path to an APK file or a directory of APKs.
  • --model: Required. The model identifier for litellm.
  • --api_key: Optional. Provide the API key directly. Overrides environment variables.
  • --api_base: Optional. The API base URL for custom providers (e.g., SiliconFlow, local models).

Demo Analysis Result

Here is a sample output from analyzing an APK using openai/Qwen/Qwen3-32B.

$ python binsight.py /path/to/some.apk --model openai/Qwen/Qwen3-32B --api_base "https://api.siliconflow.cn/v1"  --api_key "sk-xxx"

==================================================
Starting analysis for: some.apk
==================================================

--- Comprehensive Analysis (Disassembly + LLM) ---

[*] Processing: libflutter.so (from lib/arm64-v8a/libflutter.so)
  -> Analyzing with litellm (model: openai/Qwen/Qwen3-32B, attempt: 1/3)...
  [+] LLM analysis result: Intent: Google Flutter UI Framework | Confidence: High | Evidence: Known library name confirmed by numerous 'flutter::' and 'dart::' function names and strings like 'Flutter Engine'.

------------------------------------
           Final Analysis Summary for some.apk
------------------------------------

Analysis complete! Found 1 SDKs or code intents:

- Intent Analysis - libflutter.so
  Intent: Google Flutter UI Framework | Confidence: High | Evidence: Known library name confirmed by numerous 'flutter::' and 'dart::' function names and strings like 'Flutter Engine'.

About

Analyze the so file in APK through LLM+Capstone to determine the main intention of the so file and the developer (company)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages