HistoART
is a Python-based toolkit designed for researchers and professionals working with digital pathology images. Its primary goal is to facilitate effective detection, classification, and quantification of artifacts in a histopathological slide dataset through both deep learning and hand-crafted feature approaches. This toolkit is particularly aimed at pathologists, data scientists, and students looking to streamline and enhance their analysis workflows.
We encourage open-source contributions and collaborations. For further information or contributions, please contact the repository maintainer.
HistoArt comprises several integrated modules:
-
Dataset Handling (
datasets.py
):- Dataset loading and preprocessing for clean and artifact-labeled images.
- Management of combined dataset classes for training and validation, for both deep learning and hand-crafted feature based models
-
Feature Extraction (
analysis.py
):- Prints the percentage of artifacts and artifact free images in your dataset
-
Visualization (
visualize.py
):- Tools for visualizing histograms, boxplots, and other statistical representations of image features and artifacts.
- Interactive tools for exploring distributions and feature correlations.
-
Model Execution (
model_execution.py
):- Handles logic for executing the three core models of this tool: Foundation, Deep Learning, and Knowledge-Based
-
Metrics and Performance Assessment (
metrics.ipynb
):- Evaluation classification metrics such as accuracy, recall, precision, F1, and AUC to assess model performance.
-
End-to-End Pipeline (
histoart.ipynb
):- Loads and preprocesses images
- Implementation and execution of classification models for artifact detection.
- Prints analysis
To set up the HistoArt environment, first clone this repository and navigate to the project directory:
git clone https://github.com/mousavikahaki/HistoART.git
cd HistoArt
Create a virtual environment and install dependencies from the provided requirements.txt
:
python3 -m venv histoart_env
source histoart_env/bin/activate
pip install -r requirements.txt
Tested Environment:
- Linux (Ubuntu 22.04 LTS recommended)
- Python 3.10+
Some key dependencies include:
numpy==2.1.2
opencv-python==4.11.0.86
scikit-image==0.25.2
scikit-learn==1.6.1
matplotlib==3.10.1
pyfeats==1.0.1
mahotas==1.4.18
torch==2.5.1
torchvision==0.20.1
(See requirements.txt
for the full list.)
Several Jupyter notebooks and scripts are provided to quickly familiarize you with the capabilities and usage of HistoArt:
- Dataset Preparation and Loading (
utils/datasets.py
) - Artifact Feature Extraction (
utils/analysis.py
) - Model Execution and Evaluation (
utils/model_execution.py
) - Metrics Calculation (
metrics.ipynb
) - Visualization Examples (
utils/visualize.py
) - End-to-End Artifact Analysis (
histoart.ipynb
)
If you utilize HistoArt in your research or applications, please cite the repository:
@misc{HistoArt2025,
author = {Seyed M. Kahaki, Alexander R. Webber},
title = {HistoArt},
year = {2025},
publisher = {GitHub},
journal = {GitHub Repository},
howpublished = {\url{https://github.com/DIDSR/HistoArt}},
}
- HistoArt (https://zenodo.org/records/10809442)
- TCGA@Focus (https://zenodo.org/records/3910757)
- FMA Binary and Multiclass
- DLA Binary and Multiclass
- KBA Binary and Multiclass
For any inquiries, suggestions, or collaborative opportunities, please contact Seyed Kahaki or Alex Webber either via this GitHub repo or via email ([email protected];[email protected]).
We warmly welcome pull requests and issues to enhance the project's capabilities and documentation.
This project was supported in part by an appointment to the ORISE Research Participation Program at the Center for Devices and Radiological Health, U.S. Food and Drug Administration, administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and FDA/CDRH.