This is a repository for the rebuilding anew bots (bot-nim).
For understanding the updated version, try 
# Create a virtual environment
$ python3 -m venv venv
$ source venv/bin/activate
# Install the package
$ pip install -U -e .
$ botnim --helpfor development:
$ pip install -U -e .[dev]To support different Elasticsearch clusters for local development, staging, and production, you must set the following environment variables:
Required for Production:
- ES_HOST_PRODUCTION: Elasticsearch host URL for production (e.g.,- https://prod-es.example.com:9200)
- ES_USERNAME_PRODUCTION: Username for production Elasticsearch
- ES_PASSWORD_PRODUCTIONor- ELASTIC_PASSWORD_PRODUCTION: Password for production Elasticsearch
- ES_CA_CERT_PRODUCTION: Path to CA certificate file for production (optional, for SSL verification)
Required for Staging:
- ES_HOST_STAGING: Elasticsearch host URL for staging (e.g.,- https://staging-es.example.com:9200)
- ES_USERNAME_STAGING: Username for staging Elasticsearch
- ES_PASSWORD_STAGINGor- ELASTIC_PASSWORD_STAGING: Password for staging Elasticsearch
- ES_CA_CERT_STAGING: Path to CA certificate file for staging (optional, for SSL verification)
Required for Local Development:
- ES_HOST_LOCAL: Elasticsearch host URL for local development (defaults to- https://localhost:9200)
- ES_USERNAME_LOCAL: Username for local Elasticsearch
- ES_PASSWORD_LOCALor- ELASTIC_PASSWORD_LOCAL: Password for local Elasticsearch
- ES_CA_CERT_LOCAL: Path to CA certificate file for local (optional, for SSL verification)
Optional Fallback:
- ES_CA_CERT: Generic CA certificate path (used as fallback if environment-specific CA cert is not set)
Note: You must explicitly specify the environment when running commands. The application will use the environment-specific variables based on your choice. If any of these variables are missing for the environment you're using, the application will show a clear error message indicating which variables need to be set.
Important: The application no longer has default fallback environments. All commands and scripts require explicit environment specification to prevent accidental deployments to the wrong environment.
Example .env file:
# Production
ES_HOST_PRODUCTION=https://prod-es.example.com:9200
ES_USERNAME_PRODUCTION=prod_user
ES_PASSWORD_PRODUCTION=prod_pass
ES_CA_CERT_PRODUCTION=/path/to/prod-ca.crt
# Staging
ES_HOST_STAGING=https://staging-es.example.com:9200
ES_USERNAME_STAGING=staging_user
ES_PASSWORD_STAGING=staging_pass
ES_CA_CERT_STAGING=/path/to/staging-ca.crt
# Local Development
ES_HOST_LOCAL=https://localhost:9200
ES_USERNAME_LOCAL=elastic
ES_PASSWORD_LOCAL=changeme
ES_CA_CERT_LOCAL=/path/to/local-ca.crt
# Optional: Generic fallback CA certificate
ES_CA_CERT=/path/to/generic-ca.crtUsage Examples:
# Local development
botnim sync local takanon --backend es
# Staging
botnim sync staging takanon --backend es
# Production
botnim sync production takanon --backend es
# Demo scripts (environment is required)
python backend/es/demo-load-data-to-es.py local
python backend/es/demo-query-es.py "your query" local
**Note:** The demo scripts use direct imports from the `botnim` package. Make sure the package is installed in development mode (`pip install -e .`) to run these scripts.- .env.sample: Sample environment file for the benchmarking scripts.
- botnim/: Main package directory.- __init__.py: Package initialization file.
- cli.py: Command line interface for the bots.
- sync.py: Script for syncing the specifications with the OpenAI account.
- collect_sources.py: Module to collect and process the sources for the bots.
- vector_store/: Vector store management package.- __init__.py: Package initialization.
- vector_store_base.py: Abstract base class for vector store implementations.
- vector_store_openai.py: OpenAI Vector Store implementation.
- vector_store_es.py: Elasticsearch Vector Store implementation- see the backend/esdirectory for examples
- run pytestto test the Elasticsearch Vector Store.
 
- see the 
 
- benchmark/: Benchmarking scripts for the bots. Copy this file to- .envand fill in the necessary values.- run-benchmark.py: Main benchmarking script.
- assistant_loop.py: Local assistant loop tool.
 
 
- specs/: Specifications for the bots.- budgetkey/: Specifications for the budgetkey bot.- config.yaml: Agent configuration file.
- agent.txt: Agent instructions.
 
- takanon/: Specifications for the takanon bot.- config.yaml: Agent configuration file.
- agent.txt: Agent instructions.
- extraction/: Extracted and processed text from the Knesset Takanon
 
- openapi/: OpenAPI definitions of the BudgetKey (and other deprecated) APIs.
 
- botnim/document_parser/: Document extraction and processing tools (formerly takanon_extractions/)- dynamic_extractions/: Main extraction pipeline and utilities- process_document.py: Full document processing pipeline (now accessible via- botnim process-document)
- extract_structure.py: Structure extraction (now accessible via- botnim extract-structure)
- extract_content.py: Content extraction (now accessible via- botnim extract-content)
- generate_markdown_files.py: Markdown generation (now accessible via- botnim generate-markdown-files)
- logs/: Intermediate and output files (structure.json, pipeline metadata, markdown chunks)
 
 
- ui/: DEPRECATED: User interface for the bots.
The botnim query command provides several ways to interact with the vector store:
# Search in the vector store
botnim query search staging takanon common_knowledge "מה עושה יושב ראש הכנסת?"
botnim query search staging takanon common_knowledge --num-results 5 "your query here"
botnim query search staging takanon common_knowledge -n 5 "your query here"
# Show full content of search results
botnim query search staging takanon common_knowledge "your query here" --full
# or use the short flag
botnim query search staging takanon common_knowledge "your query here" -f
# Display results in right-to-left order
botnim query search staging takanon common_knowledge "your query here" --rtl
# List all available indexes
botnim query list-indexes staging --bot budgetkey
botnim query list-indexes staging
# Display indexes in right-to-left order
botnim query list-indexes staging --rtl
# Show fields/structure of an index
botnim query show-fields staging budgetkey common_knowledge
# Display fields in right-to-left order
botnim query show-fields staging budgetkey common_knowledge --rtl
# List all available search modes
botnim query list-modesAvailable query commands:
- search: Search the vector store with semantic or specialized search- Options:
- --num-results,- -n: Number of results to return (default: depends on search mode)
- --search-mode: Use a specific search mode (see- list-modesfor available modes)
- --full,- -f: Show full content of results instead of just summaries
- --rtl: Display results in right-to-left order
 
 
- Options:
- list-indexes: Show all available Elasticsearch indexes- Options:
- --rtl: Display indexes in right-to-left order
 
 
- Options:
- show-fields: Display the structure and field types of an index- Options:
- --rtl: Display fields in right-to-left order
 
 
- Options:
- list-modes: List all available search modes and their default settings
The vector store supports multiple search modes to optimize query results based on the context of the search. Each mode has its own default for num_results, which can be overridden with --num-results/-n.
To see all available search modes and their defaults, run:
botnim query list-modes# Find specific sections
botnim query search staging takanon legal_text "סעיף 12" --search-mode SECTION_NUMBER
# Browse committee decisions and legal advisor documents
botnim query search staging takanon legal_text "החלטות ועדת הכנסת" --search-mode METADATA_BROWSE
# Browse ethics committee decisions
botnim query search staging takanon ethics_decisions "ניגוד עניינים" --search-mode METADATA_BROWSE- REGULAR: Standard semantic search across all main fields. Default num_results: 7
- SECTION_NUMBER: Specialized search mode for finding Takanon sections by their number (e.g. 'סעיף 12'). Default num_results: 3
- METADATA_BROWSE: Browse mode for exploring committee decisions, legal advisor documents, and ethics decisions with metadata summaries instead of full content. Default num_results: 25
(For a full, up-to-date list, use botnim query list-modes)
To update or add content to the vector store:
# Update all contexts for the takanon bot (complete rebuild)
botnim sync staging takanon --backend es --replace-context all
# Update only a specific context without rebuilding others
botnim sync staging takanon --backend es --replace-context <context name>
# Check the content after updating
botnim query search staging takanon ethics_rules "<query>" --num-results 3The botnim evaluate command allows you to evaluate the performance of queries against a vector store:
# Basic usage
botnim evaluate takanon legal_text staging path/to/query_evaluations.csv
# With custom parameters
botnim evaluate takanon legal_text staging path/to/query_evaluations.csv --max-results 30 --adjusted-f1-limit 10Required CSV columns:
- question_id: Unique identifier for each question
- question_text: The actual question text
- doc_filename: The filename of the expected document
Options:
- --max-results: Maximum number of results to retrieve per query (default: 20)
- --adjusted-f1-limit: Number of documents to consider for adjusted F1 score calculation (default: 7)
The command will:
- Run each query against the vector store
- Compare retrieved documents with expected documents
- Calculate regular and adjusted F1 scores
- Save results to a new CSV file with the suffix '_results'
- Print summary statistics
To find available contexts for a bot, use:
botnim query list-indexes staging --bot takanon- Edit the specifications in the specs/directory.
- If using external sources (e.g., Google Spreadsheets):
- Configure the source URL in the bot's config.yaml
- The content will be automatically downloaded during sync Either:
 
- Configure the source URL in the bot's 
- botnim sync {staging/production} {budgetkey/takanon} --backend {openai/es}to sync the specifications with the OpenAI account.- Use --replace-contextflag to force a complete rebuild of the vector store (useful when context files have been modified)
- Use --context-to-updateoption to update only a specific context (e.g.,--context-to-update ethics_rules) without replacing all contexts Or
 
- Use 
- Commit the changes to the repository
- Run the 'Sync' action from the GitHub Actions tab.
Running the benchmark in production is best done using the action in the GitHub Actions tab.
For running locally:
botnim benchmarks {staging/production} {budgetkey/takanon} {TRUE/FALSE whether to save results locally}
- The document processing pipeline now saves only the final *_structure_content.jsonfiles in thespecs/takanon/extraction/folder. These are the only files required for sync/ingestion.
- All intermediate files (structure.json, pipeline metadata, and markdown files if generated) are saved in botnim/document_parser/dynamic_extractions/logs/.
- Markdown files are not required for sync—they are only for manual inspection.
- To generate markdown files for manual inspection, use the --generate-markdownflag with the pipeline:
botnim process-document botnim/document_parser/extract_sources/חוק הכנסת.html specs/takanon/extraction/ --generate-markdown- For advanced/manual markdown generation, you can use the CLI directly:
botnim generate-markdown-files specs/takanon/extraction/חוק הכנסת_structure_content.json --write-files --output-dir botnim/document_parser/dynamic_extractions/logs/chunks/- 
Use --write-filesto actually write files to disk.
- 
Use --dry-runto preview what would be generated, without writing files.
- 
If neither flag is provided, markdown content is generated in memory (for programmatic use). 
- 
The pipeline and CLI both support in-memory markdown generation for direct ingestion or further processing in automated workflows. 
specs/takanon/extraction/
    תקנון הכנסת_structure_content.json
    חוק_רציפות_הדיון_בהצעות_חוק_structure_content.json
botnim/document_parser/dynamic_extractions/logs/
    תקנון הכנסת_structure.json
    תקנון הכנסת_pipeline_metadata.json
    חוק_רציפות_הדיון_בהצעות_חוק_structure.json
    חוק_רציפות_הדיון_בהצעות_חוק_pipeline_metadata.json
    chunks/
        תקנון הכנסת_סעיף_1.md
        חוק_רציפות_הדיון_בהצעות_חוק_סעיף_1.md
        ...
- In your config.yaml, usetype: splitfor each JSON structure content file:sources: - type: split source: extraction/תקנון הכנסת_structure_content.json - type: split source: extraction/חוק_רציפות_הדיון_בהצעות_חוק_structure_content.json 
- To generate markdown files for manual review, run:
botnim process-document botnim/document_parser/extract_sources/חוק הכנסת.html specs/takanon/extraction/ --generate-markdown 
- Markdown files will be written to botnim/document_parser/dynamic_extractions/logs/chunks/.
- Sources can be configured in config.yamlwith different types:context: - name: "Knowledge Base Name" type: "files" # Multiple separate files source: "*.md" # File pattern to match - name: "Spreadsheet Knowledge" type: "google-spreadsheet" # Google spreadsheet source source: "https://..." # Spreadsheet URL - name: "Split Content" type: "split_file" # Single file that needs splitting source: "path/to/file.md" 
- botnim/cli_assistant.py: Interactive CLI tool for chatting with OpenAI assistants (supports RTL languages)
The botnim assistant command provides an interactive chat interface with OpenAI assistants:
botnim assistantbotnim assistant --assistant-id <assistant-id>botnim assistant --rtlbotnim assistant --environment production  # or staging (default)