Customer Churn Prediction for a Telecom Company

Project Overview

This project aims to develop a predictive model for identifying the probability of customer churn in a telecom company. Using historical customer data, the project leverages exploratory data analysis (EDA), data preprocessing, and machine learning techniques to build an accurate churn prediction model.

Features

Exploratory Data Analysis (EDA):

Dataset Overview:
- The project analyzes a dataset of over 70,000 unique customers from an internet service provider.
- Includes customer-specific attributes such as subscription age, average bill amount, service failures, and download/upload speeds.
Data Visualization:
- Histograms for numeric variables (e.g., subscription_age, bill_avg) to understand their distribution.
- Correlation heatmaps to identify relationships between features, such as service_failure_count and churn.
- Bar plots for binary features like is_tv_subscriber and is_movie_package_subscriber to observe their impact on churn.
Insights Derived:
- Customers with higher service_failure_count show a strong likelihood of churning.
- A longer remaining_contract correlates with reduced churn likelihood.
- Excessive download_over_limit is a potential churn driver.
Feature Importance:
- The analysis highlights key predictors of churn, including service_failure_count, remaining_contract, and bill_avg.

Data Preprocessing:

Processing Missing Values:
- Columns like remaining_contract, download_avg, and upload_avg were analyzed.remaining_contract: Missing values (or zeros) are significant for churned customers, indicating potential removal due to contract termination. This column was dropped due to high missing rates (52.5%).download_avg and upload_avg: Outliers were identified, and missing values were replaced with the median for safety.
Encoding Categorical Variables:
- Binary variables (is_tv_subscriber, is_movie_package_subscriber) were found to be already binary, negating the need for additional encoding.
Normalization of Features:
- Standardization was applied to numerical features to ensure consistency in model input.

Model Development:

Models Tried:
- Logistic Regression
- Decision Tree
- Random Forest
- Gradient Boosting
Gradient Boosting Performance:
- This model demonstrated the best performance among all tested models.
- Key Metrics:
  - Precision: 0.83 on average, with better performance on the positive class.
  - Recall: 0.83 for both classes, reflecting balanced detection capability.
  - F1-Score: Strong average of 0.83, indicating reliability.
  - ROC-AUC: High value of 0.91, showing excellent class separation capability.
  - Confusion Matrix:
  - Gradient Boosting was chosen for its ability to capture complex patterns and deliver high precision and recall, making it ideal for real-world applications.

Deployment:

Dockerized Model:
- The project includes a Dockerized environment for easy deployment and reproducibility across different systems.
Streamlit App:
- A Streamlit web application is integrated into the project to provide an interactive interface for churn prediction.

Technologies Used

Languages: Python
Libraries:
- Pandas, NumPy for data manipulation.
- Matplotlib, Seaborn for data visualization.
- Scikit-learn for machine learning.
- Streamlit For building an interactive web application to visualize data and make predictions
Tools:
- Jupyter Notebook for development and analysis.
- Git & GitHub for version control and collaboration.
- Docker for containerization.

Installation

To set up the environment and run the project, follow these steps:

Clone the repository:

git clone https://github.com/jamleston/telecom-project
cd telecom-project

Run the application:

docker-compose up --build

Access the application: Open your browser and go via link
Or you can also see our project through this link: https://projectgoit11.streamlit.app/

Usage

Use an input form to simulate customer scenarios by entering individual customer attributes.
Receive predictions on whether a customer is likely to churn.
View the impact of feature values on churn predictions in real time.

Repository Structure

├── internet_service_churn.csv       # Original dataset
├── preprocessed_dataset.csv         # Preprocessed dataset used for modeling
├── analysis/                        # Jupyter notebooks for exploratory data analysis
│   ├── analisis_K.ipynb             # EDA by Anastasya
│   └── analysis_artem.ipynb         # EDA by Artem
├── images/                          # Directory for storing visualizations
├── models/                          # Model development notebooks
│   ├── model_decision_tree.ipynb    # Decision Tree model training
│   ├── model_logistic_regression.ipynb # Logistic Regression model training
│   └── model_RF.ipynb               # Random Forest model training
├── analysis_yuli.ipynb              # Chosen EDA by Yuli
├── preprocessing.ipynb              # Data preprocessing
├── model_GB.ipynb                   # Gradient Boosting model training
├── gradient_boosting_model.pkl      # Serialized Gradient Boosting model
├── app.py                           # Streamlit application for churn prediction
├── Dockerfile                       # Docker setup for containerization
├── docker-compose.yml               # Docker Compose file for deployment
├── requirements.txt                 # Python dependencies
└── README.md                        # Project documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Customer Churn Prediction for a Telecom Company

Project Overview

Features

Technologies Used

Installation

Usage

Repository Structure

Developed by

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
analysis		analysis
images		images
models		models
Dockerfile		Dockerfile
README.md		README.md
analysis_yuli.ipynb		analysis_yuli.ipynb
app.py		app.py
docker-compose.yml		docker-compose.yml
gradient_boosting_model.pkl		gradient_boosting_model.pkl
internet_service_churn.csv		internet_service_churn.csv
model_GB.ipynb		model_GB.ipynb
preprocessed_dataset.csv		preprocessed_dataset.csv
preprocessing.ipynb		preprocessing.ipynb
requirements.txt		requirements.txt

jamleston/telecom-project

Folders and files

Latest commit

History

Repository files navigation

Customer Churn Prediction for a Telecom Company

Project Overview

Features

Technologies Used

Installation

Usage

Repository Structure

Developed by

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages