Skip to content

hulo-lang/hulomind

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Hulo Knowledge Base

An intelligent knowledge base system for the Hulo programming language, powered by RAG (Retrieval-Augmented Generation), vector databases, and large language models.

Python FastAPI License

🌟 Features

  • πŸ” Semantic Search: Advanced vector-based document search with multi-round retrieval
  • πŸ€– AI-Powered Q&A: Get intelligent answers based on Hulo documentation
  • πŸ“š Multi-language Support: Handle both English and Chinese documentation
  • πŸ—οΈ Modular Architecture: Pluggable vector stores (Memory/ChromaDB) and LLM providers
  • ⚑ High Performance: Async processing with FastAPI
  • πŸ”’ Privacy-First: Support for local LLM deployment with Ollama
  • πŸ“Š Rich Analytics: Document statistics and search insights
  • πŸ› οΈ Developer Friendly: RESTful API with automatic documentation

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Document      β”‚    β”‚   Vector Store  β”‚    β”‚   LLM Service   β”‚
β”‚   Processor     │───▢│   (Memory/      │───▢│   (OpenAI/     β”‚
β”‚                 β”‚    β”‚    ChromaDB)    β”‚    β”‚    Ollama)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                       β”‚                       β”‚
         β–Ό                       β–Ό                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Knowledge     β”‚    β”‚   FastAPI       β”‚    β”‚   Multi-round   β”‚
β”‚   Service       │◀───│   HTTP API      │◀──│   Retrieval     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

Prerequisites

  • Python 3.9+
  • uv (recommended) or pip
  • Ollama (optional, for local LLM)

Installation

  1. Clone the repository

    git clone https://github.com/your-username/hulo-knowledge-base.git
    cd hulomind
  2. Initialize the documentation submodule

    git submodule update --init --recursive
  3. Install dependencies

    uv sync
  4. Set up environment variables

    cp env.example .env
    # Edit .env with your configuration
  5. Initialize the knowledge base

    uv run python scripts/init_knowledge_base.py
  6. Start the server

    uv run python -m src.main
  7. Access the API

πŸ“– Usage

API Endpoints

πŸ” Search Documents

curl -X POST "http://localhost:8000/api/search" \
  -H "Content-Type: application/json" \
  -d '{"query": "function definition in Hulo"}'

πŸ€– Ask Questions

curl -X POST "http://localhost:8000/api/ask" \
  -H "Content-Type: application/json" \
  -d '{"query": "What is the difference between let and var in Hulo?"}'

πŸ“Š Get Statistics

curl "http://localhost:8000/api/stats"

πŸ“š List Documents

curl "http://localhost:8000/api/documents?category=grammar&language=en&limit=10"

Python Client Example

uv run python -m tests.test_api
import requests

# Search for documents
response = requests.post("http://localhost:8000/api/search", json={
    "query": "variable declaration"
})
results = response.json()
print(f"Found {results['total_results']} documents")

# Ask a question
response = requests.post("http://localhost:8000/api/ask", json={
    "query": "How do I create a function in Hulo?"
})
answer = response.json()
print(f"Answer: {answer['answer']}")

βš™οΈ Configuration

Environment Variables

Variable Description Default
API_HOST API server host 0.0.0.0
API_PORT API server port 8000
DEBUG Enable debug mode false
OPENAI_API_KEY OpenAI API key -
DASHSCOPE_API_KEY Qwen API key -
DEFAULT_VECTOR_STORE Vector store type (memory/chroma) memory
CHROMA_PERSIST_DIRECTORY ChromaDB persistence directory ./data/chroma
EMBEDDING_MODEL Sentence transformer model sentence-transformers/all-MiniLM-L6-v2
DOCS_PATH Documentation path ./docs/src

LLM Providers

The system supports multiple LLM providers with automatic fallback:

  1. OpenAI (requires OPENAI_API_KEY)
  2. Qwen API (requires DASHSCOPE_API_KEY)
  3. Local Ollama (requires Ollama running with qwen2.5:7b)
  4. Mock Service (fallback for testing)

Vector Stores

  • Memory Store: Fast, in-memory storage (default)
  • ChromaDB: Persistent, production-ready storage

πŸ§ͺ Development

Running Tests

uv run pytest

Code Formatting

uv run black src/
uv run isort src/

Type Checking

uv run mypy src/

Linting

uv run ruff check src/

πŸ“ Project Structure

hulo-knowledge-base/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ config/           # Configuration management
β”‚   β”œβ”€β”€ models/           # Data models
β”‚   β”œβ”€β”€ processors/       # Document processing
β”‚   β”œβ”€β”€ vectorstore/      # Vector storage backends
β”‚   β”œβ”€β”€ services/         # Business logic services
β”‚   β”œβ”€β”€ mcp/             # MCP protocol support
β”‚   └── main.py          # FastAPI application
β”œβ”€β”€ scripts/             # Utility scripts
β”œβ”€β”€ tests/               # Test suite
β”œβ”€β”€ docs/                # Documentation (submodule)
β”œβ”€β”€ data/                # Data storage
└── pyproject.toml       # Project configuration

πŸ”§ Advanced Features

Multi-round Retrieval

The system uses a sophisticated two-stage retrieval process:

  1. Broad Search: Low threshold (0.3) to capture more candidates
  2. Refined Search: High threshold (0.7) to filter high-quality results
  3. Merge & Deduplicate: Combine results and remove duplicates

Smart Document Chunking

Documents are intelligently split based on Markdown headers rather than fixed line counts, preserving semantic integrity.

Vector Store Abstraction

from src.vectorstore import VectorStoreFactory

# Use memory store
memory_store = VectorStoreFactory.create("memory")

# Use ChromaDB store
chroma_store = VectorStoreFactory.create("chroma")

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Intelligent knowledge base for the Hulo programming language, powered by RAG, MCP, and LLMs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages