A semantic search engine for Lean 4 projects.
Also see Herald for the idea used to translate formal statements into natural language.
python -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt-
Download PostgreSQL (you can find the installation guide here).
-
Create database:
createdb my_database_nameMemorize the database name, you will later need to set it in your
.envfile.
- Clone the jixia repo:
git clone [email protected]:frenzymath/jixia.git;cd jixia - Make sure
lean-toolchainin jixia andlean-toolchainin the project you will be indexing match.
If Lean versions don't match, you will get"... failed to read file ..., invalid header"error when you try to index the project. - Build jixia:
lake build(should take around 70s)
-
Copy the
.env.examplefile to.env:cp .env.example .env
-
Edit the
.envfile and set the required variables according to your setup.
Note
We strongly recommend using DeepSeek v3 model for a balance between quality and cost.
In this case, OPENAI_API_KEY should be set to your DeepSeek api key, OPENAI_BASE_URL should be set to https://api.deepseek.com, and OPENAI_MODEL should be set to deepseek-chat.
-
Index your Lean project (uses jixia, puts results into PostgreSQL)
python -m database jixia <project root> <prefixes>
Options:
project root: Path to the project to index. This is where thelakefile.tomlorlakefile.leanis located.prefixes: Comma-separated list of module prefixes. A module is indexed only if its module path starts with one of prefixes listed here. For example,Init,Lean,Mathlibwill include onlyInit.*,Lean.*, andMathlib.*modules.
Note: to check what modules are available in your project, and to determine how prefixes work, you can use
python -m prefix --project_root <project_root> --prefixes <prefixes>helper command. -
Create informal descriptions (uses DeepSeek api, puts results into PostgreSQL)
python -m database informal
Natural-language descriptions can be created using any OpenAI-compatible API, above we advise DeepSeek.
-
Create embeddings (uses locally-downloaded
e5-mistral-7b-instructmodel, puts results into Chromadb)python -m database vector-db
Note that indexing a large project like Mathlib requires a significant amount of both API calls (to create informal descriptions) and computational power (to compute the semantic embedding). Use with caution.
To search the database, run:
python search.py <query1> <query2> ...Note that queries containing whitespaces must be quoted, e.g., python search.py "Hello world".