-
Notifications
You must be signed in to change notification settings - Fork 0
feat: Added LLM & RAG answer comparison! #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
still some way to update the prompt on _compare_answers to do better comparison.
WalkthroughThe changes update the query handling logic in the agent module by introducing a parallel processing flow. The existing "Q&A Bot" is now redefined as a "RAG Bot" focused on community-specific data, while a new "Direct LLM" agent focuses on internal knowledge. Two tasks, Changes
Sequence Diagram(s)sequenceDiagram
participant U as User Query
participant H as AgenticHivemindFlow
participant R as RAG Bot (rag_task)
participant D as Direct LLM (llm_task)
participant A as Answer Comparator
U->>H: Submit Query
H->>R: Process via RAG Bot
H->>D: Process via Direct LLM
R-->>H: Return rag_answer
D-->>H: Return llm_answer
H->>A: Compare answers (_compare_answers)
A-->>H: Return best answer
H->>U: Deliver final output
Possibly related PRs
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tasks/hivemind/agent.py (2)
72-83
: Enforce response length limit more robustly.
The inline instruction (“Your final response must not exceed 250 words”) depends on the LLM effectively truncating or summarizing. Consider adding a final post-processing check (e.g., a word-count safeguard) before returning the result to ensure compliance with the specified limit.
139-168
: Avoid relying on string detection of “RAG” in the comparator result.
If the comparator’s response does not include the keyword “RAG” (e.g., “Choose the first answer” or “Pick the community-based response”), this logic might fail. A structured output format (e.g., JSON with a field indicating the chosen answer) would be more robust.Here’s an example of returning a simpler structured response:
if "RAG" in comparison_result.raw.upper(): return rag_answer else: return llm_answer +""" +Expect the comparator agent to return JSON in the format: +{ + "chosen": "RAG" or "LLM" +} +""" +import json +parsed = {} +try: + parsed = json.loads(comparison_result.raw) +except: + return llm_answer # fallback in case parsing fails + +if parsed.get("chosen") == "RAG": + return rag_answer +else: + return llm_answer
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tasks/hivemind/agent.py
(2 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
tasks/hivemind/agent.py (1)
tasks/hivemind/query_data_sources.py (2)
RAGPipelineTool
(73-114)setup_tools
(82-88)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: ci / test / Test
- GitHub Check: ci / lint / Lint
🔇 Additional comments (4)
tasks/hivemind/agent.py (4)
2-2
: ImportingTaskOutput
is appropriate.
No issues observed here; addingTaskOutput
aligns with its subsequent usage in this file.
88-94
: “Direct LLM” agent instantiation looks sound.
This new agent with a strictly “internal knowledge” role is clear and consistent with your parallel approach. No immediate concerns.
117-122
: Parallel processing approach is clear.
Creating aCrew
to run tasks in parallel can improve efficiency. Ensure any global state is thread-safe if accessed concurrently. Otherwise, looks good.
127-131
: Guard against potential indexing issues.
The callcrew_outputs.tasks_output[0]
and[1]
assumes both tasks completed and produced outputs. Consider verifyinglen(crew_outputs.tasks_output) >= 2
or handling unexpected failures to avoid out-of-bounds errors.Would you like to confirm that each parallel task always returns valid output, possibly by examining logs or adding checks?
still some way to update the prompt on _compare_answers to do better comparison.
Summary by CodeRabbit