Skip to content

A comprehensive solution for extracting, parsing, and analyzing certification exam questions from PDF documents with LLM-powered analysis and interactive web interface

Notifications You must be signed in to change notification settings

fxerkan/examiner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

exaMiner

A comprehensive solution for extracting, parsing, and analyzing ExamTopics certification exam questions, answers and comments provided by community as PDF documents (with LLM-powered) analysis and turn into an interactive Exam Preparation Web UI.

ExaMiner

πŸš€ Features

Core Processing

  • PDF Processing: Extract text from multiple PDF files with intelligent question boundary detection
  • Question Parsing: Parse question structures, answer options, and community responses
  • LLM Analysis: Real-time Claude AI analysis for expert-level answer determination
  • Text Enhancement: Advanced OCR error correction and text cleaning
  • Auto-increment Numbering: Unique primary key system for all questions
  • Multiple Output Formats: CSV, Markdown Tables, JSON Data, Clean JSON format for Web UI

Interactive Web UI

  • πŸ” Smart Search & Filtering: Search across all content with multiple filter options
  • πŸ“ Answer Marking: Mark your answers with persistent storage across sessions
  • ✨ Answer Highlighting: Show/hide correct answers with visual indicators
  • πŸ“Š Exam Evaluation: Comprehensive scoring system with 70% GCP passing threshold
  • πŸ“ˆ Detailed Analytics: Wrong answer review, skipped questions tracking
  • ✏️ Question Editing: Edit questions, answers, and mark correct solutions
  • ⚠️ Quality Warnings: Comprehensive extraction warnings with verbose error analysis and clickable navigation

πŸ“ Project Structure

examiner/
β”œβ”€β”€ src/                          # Core application modules
β”‚   β”œβ”€β”€ robust_question_parser.py # Main extraction script
β”‚   β”œβ”€β”€ pdf_processor.py          # PDF text extraction and processing
β”‚   β”œβ”€β”€ question_parser.py        # Question structure parsing
β”‚   β”œβ”€β”€ llm_integrator.py         # Claude API integration
β”‚   β”œβ”€β”€ text_enhancer.py          # Text cleaning and enhancement
β”‚   └── output_generator.py       # Output generation (CSV/MD/JSON)
β”œβ”€β”€ config/                       # Configuration files
β”‚   β”œβ”€β”€ api_config.json          # Claude API configuration
β”‚   β”œβ”€β”€ processing_config.json   # Processing parameters
β”‚   └── prompts.json            # LLM prompt templates
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ input/                   # Source PDF files
β”‚   └── output/                  # Generated output files
β”œβ”€β”€ web_ui/                      # Interactive web interface
β”‚   └── index.html              # Main web UI application
β”œβ”€β”€ logs/                        # Application logs
β”œβ”€β”€ tests/                       # Test files
β”œβ”€β”€ samples/                     # Sample PDF files for testing
└── assets/                      # Screenshots and documentation images

πŸ› οΈ Installation

  1. Clone the repository:

    git clone https://github.com/fxerkan/examiner.git
    cd examiner
  2. Create virtual environment:

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Configure Claude API:

    • Open config/api_config.json
    • Add your Claude API key
    {
      "claude": {
        "api_key": "your-claude-api-key-here"
      }
    }

πŸš€ Quick Start

Method 1: Manual Processing (Recommended)

  1. Place PDF files in data/input/ directory:

    data/input/
    β”œβ”€β”€ Questions_1.pdf
    β”œβ”€β”€ Questions_2.pdf
    └── Questions_3.pdf
    
  2. Run the extractor:

    source venv/bin/activate
    python src/robust_question_parser.py
  3. Start web UI:

    cd web_ui
    python -m http.server 9000
  4. Open browser: http://localhost:9000

Future Feature: PDF Upload Interface

PDF upload functionality is currently in development and temporarily disabled in the web interface. Manual file placement method above is the current supported workflow.

πŸ”§ Command Line Usage

# Run the main extraction process
python src/robust_question_parser.py

# Alternative: Use main.py (legacy)
python src/main.py

πŸ“Š Quality Assurance

The extractor includes comprehensive quality control:

  • Confidence Scoring: Each question gets a confidence score based on completeness
  • Duplicate Detection: Identifies potential duplicate questions
  • Error Tracking: Comprehensive logging and error reporting
  • Data Validation: Input validation and structure checking

Sample Quality Metrics

πŸ“Š Processing Summary:
- Total Questions: 150
- High Confidence (β‰₯0.8): 142 (94.7%)
- Medium Confidence (0.5-0.8): 6 (4.0%)
- Low Confidence (<0.5): 2 (1.3%)
- Claude Analysis Success: 148 (98.7%)
- Potential Duplicates: 3 pairs identified

πŸ§ͺ Testing

# Run basic project tests  
python tests/test_basic_functionality.py

# Check specific module
python -c "from src.question_parser import QuestionParser; print('βœ… Import successful')"

πŸ› Troubleshooting

Common Issues

  1. PDF Processing Errors:

    • Ensure PDF files are not password-protected
    • Check file permissions
    • Verify PDF files are text-based (not scanned images)
  2. Claude API Issues:

    • Verify API key in config/api_config.json
    • Check rate limits and quotas
    • Ensure internet connection
  3. Web UI Not Loading:

    • Ensure data/output/questions_web_data.json exists
    • Check browser console for JavaScript errors
    • Verify local server is running

Logging

Check logs in logs/examiner_processor.log for detailed error information:

tail -f logs/examiner_processor.log

🎯 Example Output

The system processes questions like this:

Input (from PDF):

Question #1 Topic 1

Your company wants to migrate a large, monolithic application to Google Cloud Platform...

A. Rehost (Lift and Shift) the entire application to Compute Engine
B. Refactor the application into microservices and deploy on GKE
C. Rebuild the application as a serverless solution using Cloud Functions
D. Replace with SaaS solutions wherever possible

Selected Answer: A
Highly Voted: A  
Most Recent: B

Output (Structured):

  • βœ… Question ID: Q1_1
  • πŸ“ Description: Clean, enhanced question text
  • πŸ”€ Options: A, B, C, D with full text
  • πŸ‘₯ Community: A (Highly Voted: A, Most Recent: B)
  • πŸ€– Claude AI: B (with detailed reasoning)
  • 🎯 Confidence: 0.95 (High)
  • πŸ“ Metadata: Topic 1, Page 12, Questions_1.pdf

πŸ“œ License

This project is provided as-is for educational and professional development purposes.

πŸ™ Acknowledgments

AI Assistant & Development Tools

MCP Agents & Specialized Tools

  • Playwright MCP - Browser automation and web UI testing
  • General-Purpose Agent - Complex task orchestration and multi-step workflows
  • Code-Reviewer Agent - Code quality analysis and review
  • Test-Automator Agent - Test suite creation and automation infrastructure

Data Source

  • ExamTopics - Community-driven exam questions and discussions

Technologies & Libraries

  • Python - Core application development
  • JavaScript/HTML/CSS - Interactive web interface
  • PDF Processing - Text extraction and parsing
  • JSON/CSV - Data serialization and export formats

Developed with ❀️ and πŸ€– (Claude Code) πŸš€

About

A comprehensive solution for extracting, parsing, and analyzing certification exam questions from PDF documents with LLM-powered analysis and interactive web interface

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •