Skip to content

mehra-dimple/ML_ModelDeployment

Repository files navigation

GBM Model API

This repository contains a production-ready Gradient Boosting Machine (GBM) model deployed as an API service using Docker.

Important Notes

  1. The model has been approved by the business partner and should not be modified
  2. Do not add or use additional data
  3. The API should be able to handle both single requests and batch requests of up to 10,000 rows
  4. Code should work with Python 3.10 and 3.11

Business Requirements

Classification Threshold

  • A value of 1 will be assigned if the threshold of 0.75 is met or exceeded
  • A value of 0 will be assigned if the threshold is not met

API Response Format

The output JSON contains the following fields:

  • business_outcome: Value of either 0 or 1 (Based on the threshold)
  • prediction: The predicted probability
  • feature_inputs: The specific feature input values used in the classification

Technical Requirements

  • Python 3.10 or 3.11
  • Docker
  • Required Python packages listed in requirements.txt
  • Port 8080 for API service

Project Structure

.
├── src/               # Source code for the API and model
│   ├── api.py        # FastAPI implementation
│   ├── model.py      # Model loading and prediction logic
│   └── utils.py      # Helper functions
├── tests/            # Unit tests
├── docs/             # Additional documentation
├── output/           # Model predictions
│   └── verify_test_inference.json  # Test output file
└── requirements.txt  # Python dependencies

Setup and Installation

  1. Clone the repository:
git clone <repository-url>
cd gbm-model-api
  1. Build and run the Docker container:
./run_api.sh

API Usage

The API accepts both single and batch predictions on port 8080. The API is designed to handle up to 10,000 rows in a single batch call or as individual calls.

Single Prediction Request

curl --request POST \
  --url http://localhost:8080/predict \
  --header 'content-type: application/json' \
  --data '{"x_0": "81.81515", "x_1": "0.66585831", ..., "x_99": "0.055098569"}'

Batch Prediction Request

curl --request POST \
  --url http://localhost:8080/predict \
  --header 'content-type: application/json' \
  --data '[{"x_0": "81.81", ...}, {"x_0": "-57.78", ...}]'

Example Response

{
  "business_outcome": 1,
  "prediction": 0.85,
  "feature_inputs": {
    "x_0": "81.81515",
    "x_1": "0.66585831"
  }
}

Testing

Unit Testing

Run the test suite:

python -m pytest tests/

Integration Testing

Test your API with the included candidate_27_test_inference.json file:

python sample_api_invoke.py

The output will be saved as verify_test_inference.json in the output directory.

Enterprise Scalability Considerations

Current Implementation

  1. Docker Containerization

    • Ensures consistent deployment across environments
    • Enables easy scaling through container orchestration
  2. FastAPI Framework

    • Async support for high-performance concurrent requests
    • Automatic OpenAPI documentation
    • Built-in validation and serialization
  3. Batch Processing

    • Supports both single and batch predictions
    • Optimized for handling large datasets efficiently
    • Capable of processing 10,000 rows per batch

Future Scalability Enhancements

  1. Load Balancing

    • Implement Kubernetes for container orchestration
    • Use horizontal pod autoscaling based on CPU/memory metrics
  2. Caching Layer

    • Add Redis for caching frequent predictions
    • Implement model prediction caching strategy
  3. Monitoring and Logging

    • Implement Prometheus metrics
    • Set up ELK stack for centralized logging
    • Add detailed performance monitoring
  4. API Gateway

    • Rate limiting
    • Request throttling
    • Authentication/Authorization
  5. Model Serving Optimization

    • Model quantization for faster inference
    • GPU support for larger batch processing
    • Model versioning and A/B testing capability

Development Notes

Code Quality Standards

  • PEP 8 compliant
  • Type hints for better code maintainability
  • Comprehensive docstrings
  • Unit test coverage for critical components

Security Considerations

  • Input validation and sanitization
  • Rate limiting
  • Error handling without exposing internal details
  • Secure dependencies with regular updates

Monitoring and Logging

  • Request/response logging
  • Performance metrics
  • Error tracking
  • Resource utilization monitoring

Deployment Instructions

  1. Ensure Docker is installed and running
  2. Clone the repository
  3. Navigate to the project directory
  4. Run the deployment script:
./run_api.sh

File Submission Guidelines

Important: For security scanning during submission, rename code file extensions:

  • Python files: .py.txt (e.g., preprocess.pypreprocess.txt)
  • Shell scripts: .sh.txt (e.g., run_api.shrun_api.txt)
  • Model files: .pkl.txt (e.g., model.pklmodel.txt)

Required submission files:

  1. Python files for API, data prep, tests
  2. Pickle files (if applicable)
  3. Dockerfile
  4. requirements.txt
  5. run_api.sh startup script
  6. verify_test_inference.json output file

Troubleshooting

Common issues and solutions:

  1. Port 8080 already in use:

    • Change the port in Docker configuration
    • Stop any existing services using port 8080
  2. Memory issues during batch processing:

    • Adjust batch size in the API configuration
    • Increase Docker container memory limits
  3. Slow prediction times:

    • Check system resources
    • Monitor API metrics
    • Consider scaling horizontally

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

License

[Your License Here]

About

Hackathons/Challenges related to Machine Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published