chatBIS

A RAG-based chatbot with memory for the openBIS documentation, powered by LangGraph and Ollama.

Overview

This project provides an intelligent chatbot that can answer questions about openBIS using Retrieval Augmented Generation (RAG) with conversation memory. The chatbot remembers previous interactions within a session and provides contextually aware responses.

Key Features

RAG-powered responses: Uses openBIS documentation for accurate, up-to-date answers
Conversation memory: Remembers user names, previous questions, and context using LangGraph
Session management: Maintains separate conversations with unique session IDs
Clean responses: Filters out internal reasoning for user-friendly output
Multi-interface: Available as both CLI and web interface
Persistent storage: Conversation history stored in SQLite database

Components

Scraper: Scrapes content from the openBIS documentation website
Processor: Processes the scraped content for use in RAG
Multi-Agent Conversation Engine: LangGraph-based engine with:
- RAG Agent: Answers documentation questions using retrieval-augmented generation
- Function Calling Agent: Executes pybis functions for direct openBIS interaction
- Router Agent: Intelligently routes queries to the appropriate agent
- Memory Management: Persistent conversation history across sessions
Web Interface: Browser-based chat interface with session management
CLI Interface: Command-line chat interface with memory
pybis Integration: Direct integration with pybis for openBIS operations

Installation

Requirements

Python 3.8 or higher
Ollama with the following models:
- nomic-embed-text (for embeddings)
- qwen3 (for chat)

Dependencies

The project uses the following key dependencies:

LangGraph: For conversation flow and memory management
LangChain: For LLM integration and message handling
Flask: For the web interface
SQLite: For persistent conversation storage
Ollama: For local LLM inference
pybis: For direct openBIS integration and function calling

From Source (Recommended)

git clone https://github.com/yourusername/openbis-chatbot.git
cd openbis-chatbot
pip install -e .

This installs the package in development mode, allowing you to make changes to the code and have them reflected immediately.

Using pip (Not yet available)

pip install openbis-chatbot

Note: This option will be available once the package is published to PyPI.

Usage

Simple Usage (Recommended)

The simplest way to use the chatbot is with a single command:

python -m openbis_chatbot

This will:

Check if processed data already exists in the data/processed directory
If it exists, start the chatbot with that data
If not, automatically scrape the openBIS documentation, process it, and then start the chatbot

Advanced Usage (Component-by-Component)

If you need more control, you can still run each component separately:

Scraping Content

python -m openbis_chatbot scrape --url https://openbis.readthedocs.io/en/latest/ --output ./data/raw

Processing Content

python -m openbis_chatbot process --input ./data/raw --output ./data/processed

Running the Chatbot (CLI with Memory)

python -m openbis_chatbot query --data ./data/processed

The CLI now includes conversation memory features:

Remembers your name and previous questions within a session
Type clear to start a new conversation
Type exit or quit to end the session
Use --session-id <id> to continue a previous conversation

Running the Web Interface (with Memory)

python -m openbis_chatbot --web

This will start a web server on http://localhost:5000 where you can interact with the chatbot through a browser.

The web interface includes:

Session persistence: Conversations continue across page refreshes
Clear chat button: Start fresh conversations anytime
Memory indicators: See conversation length and token usage in browser console
Responsive design: Works on desktop and mobile devices

Alternatively, you can use the provided script:

python scripts/run_web.py

Or customize the web interface with additional parameters:

python -m openbis_chatbot.web.cli --data ./data/processed --host 127.0.0.1 --port 5000

Command-Line Options

Scraper

--url URL             The base URL of the ReadtheDocs site
--output OUTPUT       The directory to save the scraped content to
--version VERSION     The specific version to scrape (e.g., 'en/latest')
--delay DELAY         The delay between requests in seconds (default: 0.5)
--max-pages MAX_PAGES The maximum number of pages to scrape
--verbose             Enable verbose logging

Processor

--input INPUT         The directory containing the scraped content
--output OUTPUT       The directory to save the processed content to
--min-chunk-size MIN_CHUNK_SIZE
                      The minimum size of a chunk in characters (default: 100)
--max-chunk-size MAX_CHUNK_SIZE
                      The maximum size of a chunk in characters (default: 1000)
--chunk-overlap CHUNK_OVERLAP
                      The overlap between chunks in characters (default: 50)
--verbose             Enable verbose logging

Chatbot (CLI with Memory)

--data DATA           The directory containing the processed content
--model MODEL         The Ollama model to use for chat (default: qwen3)
--memory-db PATH      Path to SQLite database for conversation memory
--session-id ID       Session ID to continue a previous conversation
--verbose             Enable verbose logging

Web Interface

--data DATA           The directory containing the processed content (default: ./data/processed)
--host HOST           The host to run the web interface on (default: 0.0.0.0)
--port PORT           The port to run the web interface on (default: 5000)
--model MODEL         The Ollama model to use for chat (default: qwen3)
--top-k TOP_K         The number of chunks to retrieve (default: 5)
--debug               Enable debug mode

How It Works

Scraper

The scraper works by:

Starting from the base URL of the openBIS documentation site
Downloading the HTML content of each page
Extracting links to other pages on the same domain
Following those links to scrape more pages
Saving the content of each page to a text file

Processor

The processor works by:

Reading the scraped content from text files
Chunking the content into smaller pieces
Generating embeddings for each chunk using Ollama's embedding model
Saving the chunks and their embeddings to JSON and CSV files

Conversation Engine (LangGraph-based)

The conversation engine works by:

State Management: Maintains conversation state using LangGraph's StateGraph
Memory Persistence: Stores conversation history in SQLite using LangGraph checkpoints
RAG Integration: Retrieves relevant chunks based on user queries
Context Assembly: Combines conversation history, RAG context, and current query
Response Generation: Uses Ollama's chat model with full conversation context
Response Cleaning: Removes internal reasoning tags for clean user output
Session Management: Maintains separate conversations with unique session IDs

Memory Features

Conversation History: Remembers both user messages and assistant responses
Session Isolation: Different sessions don't share memory
Token Management: Automatically limits conversation length (20 messages max)
Persistent Storage: Conversations survive application restarts
Context Awareness: Assistant remembers its own previous offers and responses

Multi-Agent Architecture

The chatbot now features a sophisticated multi-agent system that can both answer questions and perform actions:

Router Agent

Analyzes user queries to determine intent
Routes to appropriate agent based on keywords and context
Supports three decision types: rag, function_call, conversation

RAG Agent (Documentation Queries)

Handles questions about openBIS documentation
Uses retrieval-augmented generation for accurate responses
Examples: "What is openBIS?", "How do I create a sample?"

Function Calling Agent (pybis Integration)

Executes actual operations on openBIS instances
Supports connection management, sample/dataset operations, space/project management
Examples: "Connect to openBIS", "List all samples", "Create a new sample"

Available pybis Functions

Connection: connect_to_openbis, disconnect_from_openbis, check_openbis_connection
Samples: list_samples, get_sample, create_sample
Datasets: list_datasets, get_dataset
Spaces/Projects: list_spaces, list_projects

Usage Examples

# Documentation query (RAG Agent)
User: "What is openBIS?"
Response: [Detailed explanation from documentation]

# Function call (Function Calling Agent)
User: "Connect to openBIS at https://my-server.com"
Response: "Successfully connected to openBIS..."

# Mixed scenarios handled intelligently
User: "How do I list samples?" → RAG (documentation)
User: "List samples in space LAB" → Function Call (execution)

Web Interface

The web interface works by:

Starting a Flask web server
Serving a responsive HTML/CSS/JavaScript chat interface
Handling API requests from the frontend
Using the multi-agent conversation engine to generate responses
Returning the responses to the frontend in JSON format

Project Structure

chatBIS/
├── src/chatBIS/          # Main package
│   ├── scraper/                  # Web scraping components
│   ├── processor/                # Content processing components
│   ├── query/                    # Query and conversation engine
│   │   ├── conversation_engine.py # Multi-agent conversation engine
│   │   ├── query.py              # RAG query engine
│   │   └── cli.py                # CLI interface with memory
│   ├── tools/                    # Function calling tools
│   │   └── pybis_tools.py        # pybis integration tools
│   ├── web/                      # Web interface
│   └── utils/                    # Utility functions
├── tests/                        # Test suite
├── examples/                     # Example scripts
│   └── multi_agent_demo.py       # Multi-agent demo
├── scripts/                      # Utility scripts
├── data/                         # Data directory
│   ├── raw/                      # Scraped content
│   └── processed/                # Processed chunks and embeddings
├── docs/                         # Documentation
│   ├── multi_agent_architecture.md # Multi-agent documentation
│   └── presentations/            # Project presentations
├── test_multi_agent.py           # Multi-agent test script
└── requirements.txt              # Python dependencies

Memory and Token Usage

Average tokens per exchange: ~800-900 tokens
Memory overhead: ~100-200 tokens for conversation history
RAG context: ~500-600 tokens per query
Conversation limit: 20 messages (10 exchanges) per session
Storage: SQLite database for persistent conversation history

Sources

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
docs		docs
examples		examples
scripts		scripts
src/chatBIS		src/chatBIS
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

License

BAMresearch/chatBIS

Folders and files

Latest commit

History

Repository files navigation

chatBIS

Overview

Key Features

Components

Installation

Requirements

Dependencies

From Source (Recommended)

Using pip (Not yet available)

Usage

Simple Usage (Recommended)

Advanced Usage (Component-by-Component)

Scraping Content

Processing Content

Running the Chatbot (CLI with Memory)

Running the Web Interface (with Memory)

Command-Line Options

Scraper

Processor

Chatbot (CLI with Memory)

Web Interface

How It Works

Scraper

Processor

Conversation Engine (LangGraph-based)

Memory Features

Multi-Agent Architecture

Router Agent

RAG Agent (Documentation Queries)

Function Calling Agent (pybis Integration)

Available pybis Functions

Usage Examples

Web Interface

Project Structure

Memory and Token Usage

Sources

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages