Skip to content

Sciwhylab/protein-embedding-and-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 

Repository files navigation

ProtT5 Embedding and Classification Tutorial

This tutorial guides you through generating embeddings using ProtT5 and classifying DNA binding sequences with a Multilayer perceptron (MLP) classifier. This repository includes a single script with detailed explanations to help you understand each step.

Dataset

The dataset is available at Dataset Link.

  1. Requirements

  • torch
  • numpy
  • protT5 model from huggingface
  • scikit-learn
  • Bio
  • regex protobuf

References

[1] Elnaggar, Ahmed, et al. "Prottrans: Toward understanding the language of life through self-supervised learning." IEEE transactions on pattern analysis and machine intelligence 44.10 (2021): 7112-7127.

About

Tutorial code for generating embeddings from protein sequences and classification.

Resources

License

Stars

Watchers

Forks