This mini-project demonstrates how Machine Learning can assist farmers in choosing the most suitable crop for their field based on soil metrics. The goal is to predict the best crop using a simple dataset containing soil nutrient levels and pH values.
To build a multi-class classification model that can predict the optimal crop for a field using the following features:
- N: Nitrogen content in the soil
- P: Phosphorous content in the soil
- K: Potassium content in the soil
- pH: Acidity level of the soil
We also identify the single most important feature contributing to the prediction using individual logistic regression models.
Filename: soil_measures.csv
Each row in the dataset represents soil conditions of a field and the optimal crop to grow.
Columns:
N
: Nitrogen ratioP
: Phosphorous ratioK
: Potassium ratioph
: pH value of the soilcrop
: Target variable (crop name)
- Algorithm Used: Logistic Regression (Multinomial)
- Task: Multi-class Classification
- Evaluation Metric: Weighted F1 Score
We trained separate models for each feature to evaluate their individual contribution to prediction accuracy.
Feature | F1 Score |
---|---|
N | 0.091 |
P | 0.148 |
K | 0.239 β |
pH | 0.045 |
π Potassium (K) turned out to be the best single feature for predicting crop type.
- Even with minimal features, machine learning models can provide useful insights for agriculture.
- Among the soil metrics, Potassium (K) had the highest predictive power in isolation.
- Further improvement can be done using all features together, hyperparameter tuning, and advanced classifiers (e.g., Random Forest, XGBoost).
- Build a full pipeline using all features
- Apply advanced models like Decision Trees or Random Forests
- Add feature importance analysis using SHAP or permutation importance
- Integrate real-time soil sensor data (IoT)
- Python 3.x
pandas
scikit-learn
Install required packages:
pip install pandas scikit-learn
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
Make sure soil_measures.csv
is in the same directory as your script or Jupyter notebook.