🏅 Certificate Segregator

An intelligent certificate management tool that automatically categorizes and organizes PDF certificates by company names using Google's Gemini AI. Perfect for professionals, students, and organizations managing large collections of digital certificates.

🎯 Overview

Certificate Segregator leverages cutting-edge AI technology to solve the common problem of certificate organization. Instead of manually sorting through dozens or hundreds of certificates, this tool automatically reads, analyzes, and categorizes your certificates by company name in seconds.

✨ Features

🤖 AI-Powered Analysis: Uses Google Gemini 1.5 Flash to extract company names from certificate PDFs with high accuracy
📁 Automatic Organization: Creates folders and sorts certificates by company automatically - no manual work required
⚡ Batch Processing: Upload and process multiple certificates simultaneously for maximum efficiency
🎨 User-Friendly Interface: Clean, intuitive Streamlit web interface accessible from any browser
📄 PDF Support: Converts PDF certificates to images for optimal AI processing
🛡️ Error Handling: Robust error handling with informative messages and graceful failure recovery
💾 Safe Storage: Preserves original certificate quality while organizing files systematically
🔍 Smart Recognition: Handles various certificate formats and layouts intelligently

🎬 Demo

Upload your certificates, and watch as they get automatically organized by company in real-time!

🔥 Why Choose Certificate Segregator?

Problem	Solution
📚 Hundreds of unsorted certificates	⚡ Instant AI-powered organization
⏰ Hours of manual sorting	🚀 Process multiple files in seconds
😵 Difficult to find specific certificates	🎯 Clear company-based folder structure
🤔 Inconsistent naming conventions	🤖 AI extracts accurate company names
💼 Professional portfolio management	📊 Clean, systematic organization

🚀 Quick Start

⚡ TL;DR - Get Started in 5 Minutes

# 1. Clone and navigate
git clone https://github.com/harshajustin/Certificate-Clustering.git
cd Certificate-Clustering

# 2. Install dependencies
pip install -r requirements.txt

# 3. Install Poppler (macOS)
brew install poppler

# 4. Create .env file with your Gemini API key
echo "key=YOUR_GEMINI_API_KEY" > .env

# 5. Run the app
streamlit run main.py

📋 Prerequisites

Requirement	Version	Purpose
Python	3.7+	Core runtime environment
Google AI API Key	Latest	Gemini AI access
Poppler	Latest	PDF processing library
Web Browser	Modern	Streamlit interface

Installation

Clone the repository

git clone https://github.com/harshajustin/Certificate-Clustering.git
cd Certificate-Clustering

Install required packages
```
pip install -r requirements.txt
```
Install Poppler (required for pdf2image)

On macOS:
```
brew install poppler
```
On Ubuntu/Debian:
```
sudo apt-get install poppler-utils
```
On Windows:
- Download from poppler for Windows
- Add to PATH
Set up environment variables

Create a .env file in the project root:
```
key=your_google_gemini_api_key_here
```
To get a Google AI API key:
- Visit Google AI Studio
- Create a new API key
- Copy and paste it into your .env file

Running the Application

streamlit run main.py

The application will open in your default web browser at http://localhost:8501

🐳 Docker Deployment

For easy deployment using Docker:

Quick Docker Setup

# 1. Clone and navigate
git clone https://github.com/harshajustin/Certificate-Clustering.git
cd Certificate-Clustering

# 2. Set up environment
cp .env.example .env
# Edit .env and add your Gemini API key

# 3. Run with Docker Compose
docker-compose up -d

# 4. Access at http://localhost:8501

Alternative Docker Commands

# Build the image
docker build -t certificate-segregator .

# Run the container
docker run -d -p 8501:8501 --env-file .env certificate-segregator

📖 For detailed Docker instructions, see DOCKER.md

📖 How to Use

Launch the Application: Run the Streamlit app using the command above
Upload Certificates: Click "Browse files" and select one or more PDF certificates
Process: Click the "Submit" button to start processing
View Results: The app will:
- Extract company names from each certificate
- Create folders named after each company
- Save certificates in their respective company folders
- Display success/error messages for each file

📁 Project Structure

Certificate-Clustering/
├── main.py                 # Main application file
├── requirements.txt        # Python dependencies
├── .env                   # Environment variables (create this)
├── .gitignore            # Git ignore file
├── README.md             # This file
└── certificates/         # Auto-created folder for organized certificates
    ├── Company1/
    │   └── Company1_certificate.pdf
    ├── Company2/
    │   └── Company2_certificate.pdf
    └── ...

🔧 Technical Details

🏗️ Architecture

graph TD
    A[PDF Upload] --> B[PDF to Image Conversion]
    B --> C[Base64 Encoding]
    C --> D[Gemini AI Analysis]
    D --> E[Company Name Extraction]
    E --> F[Folder Creation]
    F --> G[Certificate Organization]
    G --> H[Success Notification]

🧠 Core Functions

Function	Purpose	Key Features
`process_uploaded_pdf()`	PDF Processing	Converts PDF to base64-encoded images, handles multiple pages
`get_company_name_from_pdf()`	AI Analysis	Uses Gemini AI to extract company names with context awareness
`save_certificate_to_company_folder()`	File Organization	Creates company folders and saves certificates systematically
`create_streamlit_ui()`	User Interface	Provides intuitive web interface with progress indicators

📦 Dependencies Deep Dive

Package	Version	Purpose	Key Features
`streamlit`	Latest	Web interface framework	Interactive UI, file uploads, real-time feedback
`google-generativeai`	Latest	Google Gemini AI integration	Text extraction, company name recognition
`pdf2image`	Latest	PDF to image conversion	High-quality rendering, multi-page support
`python-dotenv`	Latest	Environment variable management	Secure API key handling
`pillow`	Latest	Image processing support	Format conversion, optimization

🎯 Use Cases

👨‍💼 Professionals

HR Departments: Organize employee training certificates
Consultants: Manage client project certificates
Freelancers: Maintain professional certification portfolio

🎓 Students

Course Completion: Sort online learning certificates
Academic Records: Organize educational achievements
Skill Development: Track certification progress

🏢 Organizations

Compliance Teams: Manage regulatory certificates
Training Departments: Track employee certifications
Quality Assurance: Organize vendor certificates

📊 Performance Metrics

Metric	Performance
Processing Speed	~2-3 seconds per certificate
Accuracy Rate	95%+ company name extraction
Supported Formats	PDF (all versions)
Batch Size	Unlimited (memory dependent)
File Size Limit	Up to 200MB per file

🛠️ Configuration

🔐 Environment Variables

Variable	Description	Required	Example
`key`	Google Gemini API key	✅ Yes	`AIzaSyD...`

📄 Supported File Types

Format	Extension	Max Size	Notes
PDF	`.pdf`	200MB	All PDF versions supported

⚙️ Advanced Configuration

Create a config.yaml file for advanced settings:

# Advanced Configuration (Optional)
processing:
  max_file_size: 200MB
  timeout: 30s
  retry_attempts: 3

ai_settings:
  model: "gemini-1.5-flash"
  temperature: 0.1
  max_tokens: 1000

folders:
  base_path: "./certificates"
  naming_convention: "{company_name}_certificate"
  create_subfolders: true

🔄 Workflow

sequenceDiagram
    participant User
    participant App
    participant Gemini
    participant FileSystem

    User->>App: Upload PDF certificates
    App->>App: Convert PDF to images
    App->>Gemini: Send image for analysis
    Gemini->>App: Return company name
    App->>FileSystem: Create company folder
    App->>FileSystem: Save certificate
    App->>User: Display success message

🐛 Troubleshooting

🚨 Common Issues & Solutions

🔑 API Key Issues

Problem: "Google API Key not found" Error

Solutions:

✅ Ensure your .env file exists in the project root
✅ Verify the key is named exactly key in the .env file
✅ Check for extra spaces or quotes around the API key
✅ Verify your API key is active at Google AI Studio

# Correct .env format
key=AIzaSyD1234567890abcdef

📄 PDF Processing Errors

Problem: PDF files won't process

Solutions:

✅ Ensure Poppler is installed correctly
✅ Check that uploaded files are valid PDF documents
✅ Verify file size is under 200MB
✅ Try with a different PDF to isolate the issue

# Test Poppler installation
pdftoppm -h

📁 File Organization Issues

Problem: Certificates not saving properly

Solutions:

✅ Ensure write permissions in the project directory
✅ Check available disk space
✅ Verify the certificates/ folder can be created
✅ Close any open certificate files

🤖 AI Extraction Issues

Problem: Company names not extracted correctly

Solutions:

✅ Some certificates may have unclear text or unusual formatting
✅ Try preprocessing the PDF (ensure text is selectable)
✅ Check if the certificate contains readable text
✅ Verify your API quota hasn't been exceeded

📊 Diagnostic Commands

# Check Python version
python --version

# Verify package installation
pip list | grep -E "(streamlit|google-generativeai|pdf2image)"

# Test Poppler
pdftoppm -v

# Check file permissions
ls -la certificates/

🤝 Contributing

We welcome contributions from the community! Here's how you can help make Certificate Segregator even better:

🚀 Quick Contribution Guide

🍴 Fork the repository
🌿 Create a feature branch
```
git checkout -b feature/amazing-feature
```
💻 Make your changes
✅ Test thoroughly

📝 Commit with descriptive messages

git commit -m 'Add: Enhanced AI accuracy for handwritten certificates'

📤 Push to your branch
```
git push origin feature/amazing-feature
```
🔀 Open a Pull Request

🎯 Areas for Contribution

Area	Description	Difficulty
🤖 AI Improvements	Enhance company name extraction accuracy	Advanced
🎨 UI/UX	Improve interface design and user experience	Intermediate
📊 Analytics	Add processing statistics and insights	Intermediate
🔧 Performance	Optimize processing speed and memory usage	Advanced
📚 Documentation	Improve docs, add tutorials, create videos	Beginner
🧪 Testing	Add unit tests, integration tests	Intermediate
🌍 Localization	Add multi-language support	Intermediate

📋 Development Setup

# 1. Clone your fork
git clone https://github.com/YOUR_USERNAME/Certificate-Clustering.git
cd Certificate-Clustering

# 2. Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# 3. Install development dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt  # If available

# 4. Set up pre-commit hooks
pre-commit install

# 5. Run tests
python -m pytest

🐛 Bug Reports

Found a bug? Please create an issue with:

📝 Clear description of the problem
🔄 Steps to reproduce
💻 System information (OS, Python version)
📎 Sample files (if applicable)
📷 Screenshots (if relevant)

💡 Feature Requests

Have an idea? We'd love to hear it! Include:

🎯 Clear description of the feature
🤔 Why it would be valuable
💭 Possible implementation approach
📊 Expected impact

🏆 Contributors

Thanks to all the amazing people who have contributed to this project!

🗺️ Roadmap

🎯 Upcoming Features

Feature	Status	ETA	Priority
🔍 Advanced Search	🔄 In Progress	Q3 2025	High
📊 Analytics Dashboard	📋 Planned	Q4 2025	Medium
🌍 Multi-language Support	💭 Concept	2026	Low
📱 Mobile App	💭 Concept	TBD	Medium
☁️ Cloud Integration	💭 Concept	TBD	High

🎁 Version History

v1.0.0 (Current) - Initial release with core functionality
v0.9.0 - Beta testing phase
v0.1.0 - Alpha prototype

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License

Copyright (c) 2025 Harsha Justin

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software...

🙏 Acknowledgments

Special thanks to the amazing technologies and communities that made this project possible:

🤖 Google AI - For the powerful Gemini API
🎨 Streamlit - For the incredible web framework
📄 pdf2image - For seamless PDF processing
🐍 Python Community - For the amazing ecosystem
🌟 Open Source Community - For inspiration and collaboration
💡 Contributors - For making this project better every day

🏢 Powered By

📞 Support & Community

Need help or want to connect with other users?

🆘 Get Help

Channel	Purpose	Response Time
🐛 GitHub Issues	Bug reports, feature requests	24-48 hours
💬 Discussions	Questions, ideas, showcase	24-48 hours
📧 Email	Private inquiries	2-3 business days

📝 Before Asking for Help

✅ Check the FAQ section
✅ Search existing issues
✅ Read the documentation thoroughly
✅ Try the troubleshooting steps

🐛 Reporting Issues

When reporting bugs, please include:

**Environment:**
- OS: [e.g., macOS 12.0, Windows 11, Ubuntu 20.04]
- Python version: [e.g., 3.9.7]
- Package versions: [run `pip list`]

**Steps to reproduce:**
1. Go to '...'
2. Click on '....'
3. Upload file '....'
4. See error

**Expected behavior:**
A clear description of what you expected to happen.

**Actual behavior:**
A clear description of what actually happened.

**Additional context:**
Add any other context about the problem here.

🌟 Star History

🎉 Happy Certificate Organizing! 🎉

Made with ❤️ by Harsha Justin

If this project helped you, please consider giving it a ⭐ on GitHub!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.do		.do
.streamlit		.streamlit
certificates/ServiceNow		certificates/ServiceNow
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
DEPLOYMENT.md		DEPLOYMENT.md
DEPLOYMENT_CHECKLIST.md		DEPLOYMENT_CHECKLIST.md
DIGITALOCEAN_READY.md		DIGITALOCEAN_READY.md
DOCKER.md		DOCKER.md
Dockerfile		Dockerfile
Makefile		Makefile
Procfile		Procfile
README.md		README.md
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements.txt		requirements.txt
runtime.txt		runtime.txt
start.sh		start.sh

CognicAI/Certificate-Clustering

Folders and files

Latest commit

History

Repository files navigation

🏅 Certificate Segregator

🎯 Overview

✨ Features

🎬 Demo

🔥 Why Choose Certificate Segregator?

🚀 Quick Start

⚡ TL;DR - Get Started in 5 Minutes

📋 Prerequisites

Installation

Running the Application

🐳 Docker Deployment

Quick Docker Setup

Alternative Docker Commands

📖 How to Use

📁 Project Structure

🔧 Technical Details

🏗️ Architecture

🧠 Core Functions

📦 Dependencies Deep Dive

🎯 Use Cases

👨‍💼 Professionals

🎓 Students

🏢 Organizations

📊 Performance Metrics

🛠️ Configuration

🔐 Environment Variables

📄 Supported File Types

⚙️ Advanced Configuration

🔄 Workflow

🐛 Troubleshooting

🚨 Common Issues & Solutions

📊 Diagnostic Commands

🤝 Contributing

🚀 Quick Contribution Guide

🎯 Areas for Contribution

📋 Development Setup

🐛 Bug Reports

💡 Feature Requests

🏆 Contributors

🗺️ Roadmap

🎯 Upcoming Features

🎁 Version History

📄 License

🙏 Acknowledgments

🏢 Powered By

📞 Support & Community

🆘 Get Help

📝 Before Asking for Help

🐛 Reporting Issues

🌟 Star History

🎉 Happy Certificate Organizing! 🎉

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages