"For LLMs, reasoning is always better than no reasoning... Aggregating multiple answers is better than one answer... Retrieval plus reasoning is better than reasoning only." - Denny Zhou, Stanford CS25
This project is a practical implementation of the key techniques for enhancing Large Language Model (LLM) reasoning, as presented in Denny Zhou's talk at Stanford's CS25. It translates theoretical concepts into a working Python application, demonstrating a structured approach to building more reliable and verifiable AI systems.
This repository explores and implements a pipeline of advanced reasoning techniques:
- Chain-of-Thought (CoT) Prompting: Moving beyond simple queries to encourage models to "think step by step," improving their accuracy on multi-step problems.
- Robust Answer Parsing: A test-driven parser to reliably extract final answers from an LLM's verbose, natural language output.
- Self-Consistency: A powerful technique to improve accuracy by generating multiple diverse reasoning paths and selecting the most frequent answer (majority vote).
- Self-Improvement Data Generation: A pipeline that simulates Reinforcement Learning from AI Feedback (RLAIF) by using a verifier to filter for high-quality, correct reasoning paths, which can then be used for fine-tuning.
- Dockerization: The entire application is containerized with Docker, ensuring a portable, reproducible, and easy-to-run environment.
For a deeper dive into the architecture, including Class and Sequence diagrams, please see the Architectural Design Document.
You can run this project either locally with a Python virtual environment or using Docker (recommended).
- Python 3.12+
- Docker Desktop (for the containerized approach)
- An OpenAI API Key
git clone https://github.com/zhu-weijie/cognition-synthesis.git
cd cognition-synthesisCreate a .env file in the project root by copying the example:
cp .env.example .envNow, edit the .env file and add your OpenAI API key:
OPENAI_API_KEY="your-api-key-goes-here"
This is the simplest and most reliable way to run the project.
1. Build the Docker image:
docker build -t cognition-synthesis .2. Run the application:
The command below runs the full demonstration and uses a volume (-v) to save the generated training_data.jsonl file to your local directory.
docker run --rm --env-file .env -v "$(pwd):/app" cognition-synthesis1. Create and activate a virtual environment:
# Create the environment
python3 -m venv venv
# Activate it (on macOS/Linux)
source venv/bin/activate2. Install dependencies:
pip install -r requirements.txt3. Run the application:
python main.pyExecuting main.py (either locally or via Docker) will run a full demonstration of all the implemented techniques in sequence:
- Basic & Chain-of-Thought Tasks: Demonstrates the difference between direct queries and CoT prompting.
- Self-Consistency Task: Shows how majority voting over multiple reasoning paths can correct errors and improve reliability.
- Data Generation Pipeline: Simulates a self-improvement loop by:
- Taking problems with known answers from a
ProblemBank. - Generating 8 diverse reasoning paths for each problem.
- Using a
Verifierto check which paths lead to the correct answer. - Saving the correct
(problem, reasoning_path)pairs totraining_data.jsonl.
- Taking problems with known answers from a
The final output is a high-quality, AI-generated dataset ready for fine-tuning.
cognition-synthesis/
├── .dockerignore # Excludes files from the Docker image
├── .env # Stores your API key (gitignored)
├── .env.example # An example environment file
├── Dockerfile # Blueprint for the Docker container
├── main.py # Main entry point for the application
├── requirements.txt # Project dependencies
├── cognition_synthesis/ # Main application source code
│ ├── llm/ # LLM client wrapper
│ ├── parsing/ # Answer parsing logic
│ ├── pipelines/ # Data generation pipeline orchestrator
│ ├── prompts/ # Prompt management and formatting
│ ├── reasoning/ # Core reasoning techniques (e.g., SelfConsistency)
│ └── verification/ # Verifier and ProblemBank
├── docs/
│ └── design.md # Detailed architectural diagrams
└── tests/ # Unit tests for the project