This tutorial guides you through generating embeddings using ProtT5 and classifying DNA binding sequences with a Multilayer perceptron (MLP) classifier. This repository includes a single script with detailed explanations to help you understand each step.
The dataset is available at Dataset Link.
- torch
- numpy
- protT5 model from huggingface
- scikit-learn
- Bio
- regex protobuf
[1] Elnaggar, Ahmed, et al. "Prottrans: Toward understanding the language of life through self-supervised learning." IEEE transactions on pattern analysis and machine intelligence 44.10 (2021): 7112-7127.