Protein Structure Classifier

Custom protein structure classification, based on PDB and CATH data.

The core task is predicting protein architecture based on its amino acid sequence and 3D structure.

Description

This project develops a ML model to categorize protein structures into predefined architectural classes. The model based on Protein Structure Transformer architecture to leverage both the sequence and structural information of the protein.

The input data for the model consists of protein sequences and corresponding 3D structural information (coordinates from PDB files).

The structural data preprocessed into a format suitable for the Protein Structure Transformer (feature vectors describing secondary structure elements). CATH data is used as a ground truth reference for labeling and validating the architectural classification.

architecture_names = {(1, 10): "Mainly Alpha: Orthogonal Bundle",
                      (1, 20): "Mainly Alpha: Up-down Bundle",
                      (2, 30): "Mainly Beta: Roll",
                      (2, 40): "Mainly Beta: Beta Barrel",
                      (2, 60): "Mainly Beta: Sandwich",
                      (3, 10): "Alpha Beta: Roll",
                      (3, 20): "Alpha Beta: Alpha-Beta Barrel",
                      (3, 30): "Alpha Beta: 2-Layer Sandwich",
                      (3, 40): "Alpha Beta: 3-Layer(aba) Sandwich",
                      (3, 90): "Alpha Beta: Alpha-Beta Complex"}

This dictionary indicates the hierarchical classification of protein architectures based on a combination of secondary structure elements (alpha-helices and beta-sheets) and overall fold patterns (e.g., bundles, barrels, sandwiches).

The numerical keys within the dictionary represent specific criteria for defining the architectural types (e.g., ratios of alpha-helices, arrangement of beta-sheets).

How to run

(Coming soon)

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
PST @ 57d9dcd		PST @ 57d9dcd
imgs		imgs
models		models
notebooks		notebooks
weights		weights
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Protein Structure Classifier

Description

How to run

About

Uh oh!

Releases

Packages

Uh oh!

Languages

xmarva/protein-structure-transformer

Folders and files

Latest commit

History

Repository files navigation

Protein Structure Classifier

Description

How to run

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages