Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
3bd9f31
Create README.md
KirPetrikov Sep 25, 2023
2657157
Create Protein_suite.py
KirPetrikov Sep 25, 2023
065e4ce
Update README.md
KirPetrikov Sep 26, 2023
6d644d2
Update README.md
KirPetrikov Sep 26, 2023
5ca47ee
Add func: length, longest seq, 1to3 letters
YuryPopov Sep 26, 2023
b1c1858
Add calc_gravy function
KirPetrikov Sep 26, 2023
e0da3fa
Rename script
KirPetrikov Sep 26, 2023
1786d8a
Add calc_total_charge function
KirPetrikov Sep 26, 2023
78ce026
Add calc_iso_point function
KirPetrikov Sep 26, 2023
f92e9e0
Edit docstrings
KirPetrikov Sep 26, 2023
c52a25c
Add funcs: protein mass, the hightest and lightest protein
RovshanMuradov Sep 28, 2023
a3ebaa9
Rename script
KirPetrikov Sep 28, 2023
9185621
Code Style minor eds
KirPetrikov Sep 28, 2023
d9b28e4
Combine heaviest and lightest functions
RovshanMuradov Sep 29, 2023
b42cffb
Copied heaviest and lightest funcs to ProtSeqO
KirPetrikov Sep 29, 2023
e9b1c35
Add FUNC_DICT_FOR_LIST_RETURN = {'gravy': calc_gravy, 'iso': calc_iso…
KirPetrikov Sep 29, 2023
5e52e87
Add check_sequences func
KirPetrikov Sep 29, 2023
df0ba32
Add process_seqs func
KirPetrikov Sep 29, 2023
6396992
Code Style minor eds
KirPetrikov Sep 29, 2023
a49dbe7
Correct calc_iso_point
KirPetrikov Sep 29, 2023
52d72b5
formatted script
YuryPopov Sep 30, 2023
7508ab3
Add readme
YuryPopov Sep 30, 2023
e8c5942
Change heaviest and lightest func& Add func for the same mass
Gulya-mur Sep 30, 2023
0ae464f
Merge pull request #1 from KirPetrikov/popov_branch
KirPetrikov Sep 30, 2023
1e7eb30
Edit Code Style
KirPetrikov Sep 30, 2023
f1d3732
Remove longest_seq func
KirPetrikov Sep 30, 2023
5bcb364
Edit Code Style
KirPetrikov Sep 30, 2023
d27cf98
Edit check_sequences func
KirPetrikov Sep 30, 2023
b8c1c15
Edit case sensitivity issue in process_seqs
KirPetrikov Sep 30, 2023
1f337dd
Update README.md
KirPetrikov Sep 30, 2023
47885a5
Delete HW4_Petrikov/Protein_suite.py
KirPetrikov Sep 30, 2023
bb3afee
Delete HW4_Petrikov/ProSeqO.py
KirPetrikov Sep 30, 2023
a278f9b
Delete .DS_Store
KirPetrikov Sep 30, 2023
76b32f8
Update README.md
KirPetrikov Sep 30, 2023
5891ff1
Update ProtSeqO.py - remove test lines
KirPetrikov Sep 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
225 changes: 225 additions & 0 deletions HW4_Petrikov/ProtSeqO.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
AMINO_ACIDS_NAMES = {'A': 'Ala',
'R': 'Arg',
'N': 'Asn',
'D': 'Asp',
'V': 'Val',
'H': 'His',
'G': 'Gly',
'Q': 'Gln',
'E': 'Glu',
'I': 'Ile',
'L': 'Leu',
'K': 'Lys',
'M': 'Met',
'P': 'Pro',
'S': 'Ser',
'Y': 'Tyr',
'T': 'Thr',
'W': 'Trp',
'F': 'Phe',
'C': 'Cys'}

GRAVY_AA_VALUES = {'L': 3.8,
'K': -3.9,
'M': 1.9,
'F': 2.8,
'P': -1.6,
'S': -0.8,
'T': -0.7,
'W': -0.9,
'Y': -1.3,
'V': 4.2,
'A': 1.8,
'R': -4.5,
'N': -3.5,
'D': -3.5,
'C': 2.5,
'Q': -3.5,
'E': -3.5,
'G': -0.4,
'H': -3.2,
'I': 4.5}

VALID_SYMBOLS = set(AMINO_ACIDS_NAMES)


def calc_gravy(seq: str) -> float:
"""
Calculate GRAVY (grand average of hydropathy) value
of given amino acids sequence
"""
gravy_aa_sum = 0
for amino_ac in seq:
gravy_aa_sum += GRAVY_AA_VALUES[amino_ac]
return round(gravy_aa_sum / len(seq), 3)


def calc_total_charge(charged_amino_ac_numbers_list: list,
ph_value: float) -> float:
"""
Calculate the approximate total charge of some amino acid sequence
for given pH value
based only on a list of the number of key charged amino acids.
"""
n_terminal_charge = 1 / (1 + 10 ** (ph_value - 8.2))
c_terminal_charge = -1 / (1 + 10 ** (3.65 - ph_value))
cys_charge = -charged_amino_ac_numbers_list[0] / (1 + 10 ** (8.18 - ph_value))
asp_charge = -charged_amino_ac_numbers_list[1] / (1 + 10 ** (3.9 - ph_value))
glu_charge = -charged_amino_ac_numbers_list[2] / (1 + 10 ** (4.07 - ph_value))
tyr_charge = -charged_amino_ac_numbers_list[3] / (1 + 10 ** (10.46 - ph_value))
his_charge = charged_amino_ac_numbers_list[4] / (1 + 10 ** (ph_value - 6.04))
lys_charge = charged_amino_ac_numbers_list[5] / (1 + 10 ** (ph_value - 10.54))
arg_charge = charged_amino_ac_numbers_list[6] / (1 + 10 ** (ph_value - 12.48))
total_charge = (n_terminal_charge +
c_terminal_charge +
cys_charge +
asp_charge +
glu_charge +
tyr_charge +
his_charge +
lys_charge +
arg_charge)
return total_charge


def calc_iso_point(seq: str):
"""
Calculate approximate isoelectric point of given amino acids sequence
"""
charged_amino_ac_numbers = []
for amino_ac in ("C", "D", "E", "Y", "H", "K", "R"):
charged_amino_ac_numbers.append(seq.count(amino_ac))
Comment on lines +89 to +91

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Тут не оч хорошо использовать список. Лучше бы сделать тут вот так:

    charged_amino_ac_numbers = {
        "C": 0, "D": 0, "E": 0, "Y": 0, "H": 0, "K": 0, "R": 0
    }
    for amino_ac in seq:
        if amino_ac in charged_amino_ac_numbers:
            charged_amino_ac_numbers[amino_ac] += 1

total_charge_tmp = 1
ph_iso_point = -0.1
while total_charge_tmp > 0:
ph_iso_point += 0.1
total_charge_tmp = calc_total_charge(
charged_amino_ac_numbers,
ph_iso_point)
return round(ph_iso_point, 1)


def transform_to_three_letters(seq: str) -> str:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Имхо, лучше было бы разделитель как раз задать параметром по умолчанию

"""
Transform 1-letter aminoacid symbols in
sequence to 3-letter symbols separated by
hyphens.
"""
new_name = ''

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

кажется, не оч удачное название ))
мб хотя бы three_letter_seq

for amino_acid in seq:
new_name += AMINO_ACIDS_NAMES[amino_acid] + '-'
return new_name[:-1]


def sequence_length(seq: str) -> int:
"""
Function counts number of aminoacids in
given sequence
"""
return len(seq)


def calc_protein_mass(seq: str) -> int:
"""
Calculate protein molecular weight using the average
molecular weight of amino acid - 110 Da
"""
return len(seq) * 110

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

У вас же есть функция sequence_length... почему не использовали её?

return sequence_length(seq) * 110



def find_heaviest_proteins(sequence: list):
"""
Return the sequence of the heaviest protein from list
"""
protein_mass = {}
list_of_protein = sequence
for i in list_of_protein:
protein_mass[i] = calc_protein_mass(i)
return count_uniq_max_mass(protein_mass)
Comment on lines +130 to +138

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Все те же самые комменты, что и для функции find_lightest_proteins ))



def count_uniq_max_mass(protein_mass):
"""
Count amount of proteins with the same maximum mass and return them
"""
max_weight = max(protein_mass.values())
count_protein = 0
proteins = []
for i in protein_mass:
if protein_mass[i] == max_weight:
count_protein += 1
if count_protein >= 1:
proteins.append(i)

return f'{proteins} - {max_weight}'
Comment on lines +141 to +154

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Все те же самые комменты, что и для функции count_uniq_min_mass ))



def find_lightest_proteins(sequence: list):
"""
Return the sequence of the lightest protein from list
"""
Comment on lines +158 to +160

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Опять же, функция count_uniq_min_mass возвращает явно не список белковых последовательноcтей :(

protein_mass = {}
list_of_protein = sequence
for i in list_of_protein:
Comment on lines +162 to +163

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ух )))

  1. Зачем list_of_protein = sequence? )) Почему бы сразу не задать это как параметр функции:
def find_lightest_proteins(list_of_proteins: List[str]) -> List[str]:  # Хотя у вас тут возвращается не список :(
  1. Почему i? Чем не нравится for protein in list_of_proteins
  2. Почему бы не назвать просто proteins? ))
    В итоге:
def find_lightest_proteins(proteins: List[str]) -> List[str]:
    """
    ...
    """
    protein_mass = {}
    for protein in proteins:
        protein_mass[protein] = calc_protein_mass(protein)

protein_mass[i] = calc_protein_mass(i)
return count_uniq_min_mass(protein_mass)


def count_uniq_min_mass(protein_mass):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

По названию я ожидаю, что функция вернет вообще число )))
То есть буквально "количество уникальных минимумов масс" ... а она почему-то возвращает строку со списком последовательностей :/

"""
Count amount of proteins with the same minimum mass and return them
"""
Comment on lines +168 to +171

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Нету описания, что вообще принимает функция (ни в тайп хинте, ни в докстринге). Буквально, что там находится? Из названия вот вообще не очевидно, почему это словарь... Что в ключе, что в значении?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and return them

Нет )))
Вы возвращаете строку ... Т.е. дальше этот список белков не особо можно использовать )))

min_weight = min(protein_mass.values())
count_protein = 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

А зачем тут вот это?

proteins = []
for i in protein_mass:
if protein_mass[i] == min_weight:
Comment on lines +175 to +176

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Почему i?
И вообще зачем брать по ключу, если можно было сделать в духе:

for protein, weight in protein_mass.items():
    if weight == min_weight:

count_protein += 1
if count_protein >= 1:
Comment on lines +177 to +178

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Зачем? 0_0
У нас изначально это переменная 0. Если мы попали в блок ифа, то у нас тут получится 1. Ну и дальше у нас явно будет >= 1 ... Иначе говоря, у нас всегда будет здесь True ))

proteins.append(i)
return f'{proteins} - {min_weight}'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Тут надо бы было делать просто return proteins



def check_sequences(seqs: list):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def check_sequences(seqs: List[str]):

"""
Raise ValueError if at least one sequence
contains non valid symbols
"""
if not (isinstance(seqs, list)):
raise ValueError("Enter valid protein sequence")
for seq in seqs:
if (not (isinstance(seq, str))) or (not (set(seq.upper()).issubset(VALID_SYMBOLS))):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not isinstance(seq, str) or not set(seq.upper()).issubset(VALID_SYMBOLS):

raise ValueError("Enter valid protein sequence")


# Didn't place at the beginning because the functions are defined above
FUNC_STR_INPUT = {
'gravy': calc_gravy,
'iso': calc_iso_point,
'rename': transform_to_three_letters,
'lengths': sequence_length,
'molw': calc_protein_mass}

FUNC_LIST_INPUT = {
'heavy': find_heaviest_proteins,
'light': find_lightest_proteins}


def process_seqs(option: str, seqs: list):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def process_seqs(option: str, seqs: List[str]):

"""
Perform some simple operations on amino acids sequences.
"""
Comment on lines +209 to +211

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Раз это главная функция (в какой-то степени точка входа), тут бы побольше описания дать...

if isinstance(seqs, str):
seq_tmp = seqs
seqs = [seq_tmp]
Comment on lines +212 to +214

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ну вот по идее такого никогда не должно быть, т.к. вы явно сказали, что seqs -- лист )))
Непонятно, зачем проверять её на строчку...

check_sequences(seqs)
if option in FUNC_STR_INPUT.keys():
results = []
for seq in seqs:
result_tmp = FUNC_STR_INPUT[option](seq.upper())
results.append(result_tmp)
return results
elif option in FUNC_LIST_INPUT.keys():
return FUNC_LIST_INPUT[option](seqs)
Comment on lines +222 to +223

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

А че почему не сделать всё STR_INPUT? ))
Ну то есть всё вспомогательные функции принимали бы строку ... тогда вот это не нужно было бы

else:
raise ValueError("Enter valid operation")
55 changes: 55 additions & 0 deletions HW4_Petrikov/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# ProtSeqO

## Tool for PROtein SEQuences Operation

*This is the repo for the fourth homework of the BI Python 2023 course*

This tool can perform some simple operations on amino acid sequences:
* help you calculate protein lengths, molecular weights, isoelectric points and GRAVY values
* find and show you heaviest and lightest proteins
* rewrite 1-letter sequence to 3-letter sequence

## How use ProtSeqO
Execute script (you should be on directory with script):
```bash
python3
>>> from ProtSeqO import process_seqs
>>>print(process_seqs(__command__, __sequence or list of sequences__))
```

You can input to `process_seqs()` sequence as string or list with any strings of sequences. __Pay attention__ that your sequence(s) should contain 1-letter symbols (case does not matters) of 20 common amino acids ('U' for selenocysteine and 'O' for pyrrolysine doesn't allowed).

Command must be a string with one of followed options.

## ProtSeqO options
* 'lengths' - return list with numbers of AA in each sequence(s)
* 'molw' - return list of protein molecular weight (use the average molecular weight of AA, 110 Da)
* 'iso' - return list of approximate isoelectric point of given amino acids sequence
* 'gravy' - return list of GRAVY (grand average of hydropathy) values
* 'rename' - return list of sequences in 3-letter AA code (AA separated by hyphens)
* 'heavy' - return the sequence(s) with maximum molecular weight and weigth value
* 'light' - return the sequence(s) with minimum molecular weight and weigth value

## ProtSeqO using examples
```python
python3
>>> from ProtSeqO import process_seqs
>>> print(process_seqs('iso', ['ACGTWWA', 'ILATTWP']))
### [5.8, 6.0]
>>> print(process_seqs('gravy', 'ilattwp'))
### [0.886]
>>> print(process_seqs('rename', ['ACGTwwa']))
### ['Ala-Cys-Gly-Thr-Trp-Trp-Ala']
>>> print(process_seqs('heavy', ['ILATTWP'], ['ACGTwwa']))
### ['ILATTWP', 'ACGTwwa'] - 770
```

## In case of problem - contact with us in GitHub
___Developers___:
* Petrikov Kirill
* Muradova Gulgaz
* Yury Popov

![Developers](https://github.com/KirPetrikov/HW4_Functions2/blob/HW4_Petrikov/HW4_Petrikov/images/pic.jpg "We are here")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Не оч понял, а кто 4-ый девелопер )))
Причем судя по его гитхабу, он вроде как пишет на Го )))



Binary file added HW4_Petrikov/images/pic.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.