A comprehensive solution for extracting, parsing, and analyzing ExamTopics certification exam questions, answers and comments provided by community as PDF documents (with LLM-powered) analysis and turn into an interactive Exam Preparation Web UI.
- PDF Processing: Extract text from multiple PDF files with intelligent question boundary detection
- Question Parsing: Parse question structures, answer options, and community responses
- LLM Analysis: Real-time Claude AI analysis for expert-level answer determination
- Text Enhancement: Advanced OCR error correction and text cleaning
- Auto-increment Numbering: Unique primary key system for all questions
- Multiple Output Formats: CSV, Markdown Tables, JSON Data, Clean JSON format for Web UI
- π Smart Search & Filtering: Search across all content with multiple filter options
- π Answer Marking: Mark your answers with persistent storage across sessions
- β¨ Answer Highlighting: Show/hide correct answers with visual indicators
- π Exam Evaluation: Comprehensive scoring system with 70% GCP passing threshold
- π Detailed Analytics: Wrong answer review, skipped questions tracking
- βοΈ Question Editing: Edit questions, answers, and mark correct solutions
β οΈ Quality Warnings: Comprehensive extraction warnings with verbose error analysis and clickable navigation
examiner/
βββ src/ # Core application modules
β βββ robust_question_parser.py # Main extraction script
β βββ pdf_processor.py # PDF text extraction and processing
β βββ question_parser.py # Question structure parsing
β βββ llm_integrator.py # Claude API integration
β βββ text_enhancer.py # Text cleaning and enhancement
β βββ output_generator.py # Output generation (CSV/MD/JSON)
βββ config/ # Configuration files
β βββ api_config.json # Claude API configuration
β βββ processing_config.json # Processing parameters
β βββ prompts.json # LLM prompt templates
βββ data/
β βββ input/ # Source PDF files
β βββ output/ # Generated output files
βββ web_ui/ # Interactive web interface
β βββ index.html # Main web UI application
βββ logs/ # Application logs
βββ tests/ # Test files
βββ samples/ # Sample PDF files for testing
βββ assets/ # Screenshots and documentation images
-
Clone the repository:
git clone https://github.com/fxerkan/examiner.git cd examiner -
Create virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Configure Claude API:
- Open
config/api_config.json - Add your Claude API key
{ "claude": { "api_key": "your-claude-api-key-here" } } - Open
-
Place PDF files in
data/input/directory:data/input/ βββ Questions_1.pdf βββ Questions_2.pdf βββ Questions_3.pdf -
Run the extractor:
source venv/bin/activate python src/robust_question_parser.py -
Start web UI:
cd web_ui python -m http.server 9000 -
Open browser: http://localhost:9000
PDF upload functionality is currently in development and temporarily disabled in the web interface. Manual file placement method above is the current supported workflow.
# Run the main extraction process
python src/robust_question_parser.py
# Alternative: Use main.py (legacy)
python src/main.pyThe extractor includes comprehensive quality control:
- Confidence Scoring: Each question gets a confidence score based on completeness
- Duplicate Detection: Identifies potential duplicate questions
- Error Tracking: Comprehensive logging and error reporting
- Data Validation: Input validation and structure checking
π Processing Summary:
- Total Questions: 150
- High Confidence (β₯0.8): 142 (94.7%)
- Medium Confidence (0.5-0.8): 6 (4.0%)
- Low Confidence (<0.5): 2 (1.3%)
- Claude Analysis Success: 148 (98.7%)
- Potential Duplicates: 3 pairs identified
# Run basic project tests
python tests/test_basic_functionality.py
# Check specific module
python -c "from src.question_parser import QuestionParser; print('β
Import successful')"-
PDF Processing Errors:
- Ensure PDF files are not password-protected
- Check file permissions
- Verify PDF files are text-based (not scanned images)
-
Claude API Issues:
- Verify API key in
config/api_config.json - Check rate limits and quotas
- Ensure internet connection
- Verify API key in
-
Web UI Not Loading:
- Ensure
data/output/questions_web_data.jsonexists - Check browser console for JavaScript errors
- Verify local server is running
- Ensure
Check logs in logs/examiner_processor.log for detailed error information:
tail -f logs/examiner_processor.logThe system processes questions like this:
Input (from PDF):
Question #1 Topic 1
Your company wants to migrate a large, monolithic application to Google Cloud Platform...
A. Rehost (Lift and Shift) the entire application to Compute Engine
B. Refactor the application into microservices and deploy on GKE
C. Rebuild the application as a serverless solution using Cloud Functions
D. Replace with SaaS solutions wherever possible
Selected Answer: A
Highly Voted: A
Most Recent: B
Output (Structured):
- β Question ID: Q1_1
- π Description: Clean, enhanced question text
- π€ Options: A, B, C, D with full text
- π₯ Community: A (Highly Voted: A, Most Recent: B)
- π€ Claude AI: B (with detailed reasoning)
- π― Confidence: 0.95 (High)
- π Metadata: Topic 1, Page 12, Questions_1.pdf
This project is provided as-is for educational and professional development purposes.
- Claude Code - Primary AI assistant for development
- Anthropic Claude - LLM integration for question analysis
- Claude Sonnet 4 - Advanced reasoning and code generation
- Playwright MCP - Browser automation and web UI testing
- General-Purpose Agent - Complex task orchestration and multi-step workflows
- Code-Reviewer Agent - Code quality analysis and review
- Test-Automator Agent - Test suite creation and automation infrastructure
- ExamTopics - Community-driven exam questions and discussions
- Python - Core application development
- JavaScript/HTML/CSS - Interactive web interface
- PDF Processing - Text extraction and parsing
- JSON/CSV - Data serialization and export formats
Developed with β€οΈ and π€ (Claude Code) π
