🧠 Hulo Knowledge Base

An intelligent knowledge base system for the Hulo programming language, powered by RAG (Retrieval-Augmented Generation), vector databases, and large language models.

🌟 Features

🔍 Semantic Search: Advanced vector-based document search with multi-round retrieval
🤖 AI-Powered Q&A: Get intelligent answers based on Hulo documentation
📚 Multi-language Support: Handle both English and Chinese documentation
🏗️ Modular Architecture: Pluggable vector stores (Memory/ChromaDB) and LLM providers
⚡ High Performance: Async processing with FastAPI
🔒 Privacy-First: Support for local LLM deployment with Ollama
📊 Rich Analytics: Document statistics and search insights
🛠️ Developer Friendly: RESTful API with automatic documentation

🏗️ Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Document      │    │   Vector Store  │    │   LLM Service   │
│   Processor     │───▶│   (Memory/      │───▶│   (OpenAI/     │
│                 │    │    ChromaDB)    │    │    Ollama)      │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Knowledge     │    │   FastAPI       │    │   Multi-round   │
│   Service       │◀───│   HTTP API      │◀──│   Retrieval     │
└─────────────────┘    └─────────────────┘    └─────────────────┘

🚀 Quick Start

Prerequisites

Python 3.9+
uv (recommended) or pip
Ollama (optional, for local LLM)

Installation

Clone the repository

git clone https://github.com/your-username/hulo-knowledge-base.git
cd hulomind

Initialize the documentation submodule
```
git submodule update --init --recursive
```
Install dependencies
```
uv sync
```

Set up environment variables

cp env.example .env
# Edit .env with your configuration

Initialize the knowledge base

uv run python scripts/init_knowledge_base.py

Start the server
```
uv run python -m src.main
```
Access the API
- API Documentation: http://localhost:8000/docs
- Health Check: http://localhost:8000/health

📖 Usage

API Endpoints

🔍 Search Documents

curl -X POST "http://localhost:8000/api/search" \
  -H "Content-Type: application/json" \
  -d '{"query": "function definition in Hulo"}'

🤖 Ask Questions

curl -X POST "http://localhost:8000/api/ask" \
  -H "Content-Type: application/json" \
  -d '{"query": "What is the difference between let and var in Hulo?"}'

📊 Get Statistics

curl "http://localhost:8000/api/stats"

📚 List Documents

curl "http://localhost:8000/api/documents?category=grammar&language=en&limit=10"

Python Client Example

uv run python -m tests.test_api

import requests

# Search for documents
response = requests.post("http://localhost:8000/api/search", json={
    "query": "variable declaration"
})
results = response.json()
print(f"Found {results['total_results']} documents")

# Ask a question
response = requests.post("http://localhost:8000/api/ask", json={
    "query": "How do I create a function in Hulo?"
})
answer = response.json()
print(f"Answer: {answer['answer']}")

⚙️ Configuration

Environment Variables

Variable	Description	Default
`API_HOST`	API server host	`0.0.0.0`
`API_PORT`	API server port	`8000`
`DEBUG`	Enable debug mode	`false`
`OPENAI_API_KEY`	OpenAI API key	-
`DASHSCOPE_API_KEY`	Qwen API key	-
`DEFAULT_VECTOR_STORE`	Vector store type (`memory`/`chroma`)	`memory`
`CHROMA_PERSIST_DIRECTORY`	ChromaDB persistence directory	`./data/chroma`
`EMBEDDING_MODEL`	Sentence transformer model	`sentence-transformers/all-MiniLM-L6-v2`
`DOCS_PATH`	Documentation path	`./docs/src`

LLM Providers

The system supports multiple LLM providers with automatic fallback:

OpenAI (requires OPENAI_API_KEY)
Qwen API (requires DASHSCOPE_API_KEY)
Local Ollama (requires Ollama running with qwen2.5:7b)
Mock Service (fallback for testing)

Vector Stores

Memory Store: Fast, in-memory storage (default)
ChromaDB: Persistent, production-ready storage

🧪 Development

Running Tests

uv run pytest

Code Formatting

uv run black src/
uv run isort src/

Type Checking

uv run mypy src/

Linting

uv run ruff check src/

📁 Project Structure

hulo-knowledge-base/
├── src/
│   ├── config/           # Configuration management
│   ├── models/           # Data models
│   ├── processors/       # Document processing
│   ├── vectorstore/      # Vector storage backends
│   ├── services/         # Business logic services
│   ├── mcp/             # MCP protocol support
│   └── main.py          # FastAPI application
├── scripts/             # Utility scripts
├── tests/               # Test suite
├── docs/                # Documentation (submodule)
├── data/                # Data storage
└── pyproject.toml       # Project configuration

🔧 Advanced Features

Multi-round Retrieval

The system uses a sophisticated two-stage retrieval process:

Broad Search: Low threshold (0.3) to capture more candidates
Refined Search: High threshold (0.7) to filter high-quality results
Merge & Deduplicate: Combine results and remove duplicates

Smart Document Chunking

Documents are intelligently split based on Markdown headers rather than fixed line counts, preserving semantic integrity.

Vector Store Abstraction

from src.vectorstore import VectorStoreFactory

# Use memory store
memory_store = VectorStoreFactory.create("memory")

# Use ChromaDB store
chroma_store = VectorStoreFactory.create("chroma")

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs @ 4b4034b		docs @ 4b4034b
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
env.example		env.example
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 Hulo Knowledge Base

🌟 Features

🏗️ Architecture

🚀 Quick Start

Prerequisites

Installation

📖 Usage

API Endpoints

🔍 Search Documents

🤖 Ask Questions

📊 Get Statistics

📚 List Documents

Python Client Example

⚙️ Configuration

Environment Variables

LLM Providers

Vector Stores

🧪 Development

Running Tests

Code Formatting

Type Checking

Linting

📁 Project Structure

🔧 Advanced Features

Multi-round Retrieval

Smart Document Chunking

Vector Store Abstraction

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Languages

License

hulo-lang/hulomind

Folders and files

Latest commit

History

Repository files navigation

🧠 Hulo Knowledge Base

🌟 Features

🏗️ Architecture

🚀 Quick Start

Prerequisites

Installation

📖 Usage

API Endpoints

🔍 Search Documents

🤖 Ask Questions

📊 Get Statistics

📚 List Documents

Python Client Example

⚙️ Configuration

Environment Variables

LLM Providers

Vector Stores

🧪 Development

Running Tests

Code Formatting

Type Checking

Linting

📁 Project Structure

🔧 Advanced Features

Multi-round Retrieval

Smart Document Chunking

Vector Store Abstraction

🤝 Contributing

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages