Skip to content

HW4 Zolotikov #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
b22ea48
Create HW4_Zolotikov
glitchheadgit Sep 27, 2023
69aab65
Add to_rna function
Sep 28, 2023
ae5c637
Add define_charge function
Sep 28, 2023
9b314eb
Create README.md to_rna and define_charge parts
Sep 28, 2023
86127fe
Add to_dna, define_polarity functions
glitchheadgit Sep 29, 2023
86d1d11
Merge branch 'HW4_Zolotikov' into HW4_Zolotikov
glitchheadgit Sep 29, 2023
59e1fdc
Merge pull request #1 from dorzhi-b/HW4_Zolotikov
glitchheadgit Sep 29, 2023
1b80718
Fix define_polarity function
glitchheadgit Sep 29, 2023
2a9e3b4
Add main, change abbreveation and check sequence function
BeskrovnaiaM Sep 29, 2023
2c68ee9
Merge pull request #2 from BeskrovnaiaM/HW4_Zolotikov
glitchheadgit Sep 30, 2023
25e3f80
Modify to_dna, define_polarity functions
glitchheadgit Sep 30, 2023
65d915f
Modify to_dna, define_polarity functions
glitchheadgit Sep 30, 2023
b0a4d82
Update README.md
glitchheadgit Sep 30, 2023
952fae5
Update HW4_Zolotikov
glitchheadgit Sep 30, 2023
185e764
Update README.md
BeskrovnaiaM Sep 30, 2023
4efc0e8
Add team's photo
BeskrovnaiaM Sep 30, 2023
5c0687a
Update README.md with photo
BeskrovnaiaM Sep 30, 2023
b6c003c
Merge pull request #3 from BeskrovnaiaM/HW4_Zolotikov
glitchheadgit Sep 30, 2023
85382ee
Create dir HW4_Zolotikov, rename HW4_Zolotikov to protein_tool.py and…
BeskrovnaiaM Sep 30, 2023
df3007d
Update README.md
BeskrovnaiaM Sep 30, 2023
648e276
Merge branch 'glitchheadgit:HW4_Zolotikov' into HW4_Zolotikov
BeskrovnaiaM Sep 30, 2023
0e82b72
Update README.md
glitchheadgit Sep 30, 2023
e193533
Update README.md
glitchheadgit Sep 30, 2023
f76d7a2
Update README.md
glitchheadgit Sep 30, 2023
9e59966
Merge pull request #4 from BeskrovnaiaM/HW4_Zolotikov
glitchheadgit Sep 30, 2023
a537b98
Update typing protein_tool.py
glitchheadgit Sep 30, 2023
d5716f4
Update README.md
glitchheadgit Sep 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
200 changes: 200 additions & 0 deletions HW4_Zolotikov/protein_tool.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
from typing import Dict, List, Union

# Dorzhi
def to_rna(seq: str, rna_dict: Dict[str, str] = {'F': 'UUY', 'L': 'YUN', 'I': 'AUH', 'M': 'AUG',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Словарь стоило бы вынести как константу

'V': 'GUN', 'S': 'WSN', 'P': 'CCN', 'T': 'ACN',
'A': 'GCN', 'Y': 'UAY', 'H': 'CAY', 'Q': 'CAR',
'N': 'AAY', 'K': 'AAR', 'D': 'GAY', 'E': 'GAR',
'C': 'UGY', 'R': 'MGN', 'G': 'GGN', 'W': 'UGG'}) -> str:
"""
Converts an amino acid sequence into an RNA sequence.

Parameters
----------
seq : str
Amino acid sequence.
rna_dict : dict
Dictionary defining the correspondence of amino acids
to RNA triplets (default, standard code).
Returns
-------
str
RNA sequence.

"""
result = ''.join(rna_dict[base] for base in seq)
return result
Comment on lines +5 to +26

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍👍👍 за суть операции,
но что случилось с отступами?



def define_charge(seq: str, positive_charge: List[str] = ['R', 'K', 'H'],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Тут также словарь является константой, зачем его переопределять при каждом вызове функции?

negative_charge: List[str] = ['D', 'E']) -> Dict[str, int]:
"""
Counts the number of amino acids with positive charge, negative charge,
and neutral amino acids in the sequence.

Parameters
----------
seq : str
Amino acid sequence (string).
positive_charge : list
List of amino acids with positive charge (default is ['R', 'K', 'H']).
negative_charge : list
List of amino acids with negative charge (default is ['D', 'E']).
Comment on lines +40 to +42

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Использование списка как типа данных для поиска менее эффективно по скорости, чем множества. К тому же непонятно, зачем делать его аргументом функции - вряд ли пользователь захочет ввести свои, другие данные


Returns
-------
dict
A dictionary containing the counts of amino acids and their labels:
- 'Positive' for amino acids with positive charge.
- 'Negative' for amino acids with negative charge.
- 'Neutral' for neutral amino acids.
"""
positive_count = 0
negative_count = 0
neutral_count = 0

for aa in seq:
if aa in positive_charge:
positive_count += 1
elif aa in negative_charge:
negative_count += 1
else:
neutral_count += 1

result = {
'Positive': positive_count,
'Negative': negative_count,
'Neutral': neutral_count
}
return result
Comment on lines +52 to +69

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Хорошее разбиение кода пустыми строками на смысловые участки, хороший вывод
С отступами проблема



# Ustin
POLAR_AA = {'D', 'E', 'R', 'K', 'H', 'N', 'Q', 'S', 'T', 'Y', 'C'}
NONPOLAR_AA = {'A', 'G', 'V', 'L', 'I', 'P', 'F', 'M', 'W'}
Comment on lines +73 to +74

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Здорово, что тут переменные являются константами, и названы корректно! 🔥
Для задач поиска, кстати, больше подходят именно сеты, а не списки 🔥🔥🔥

На самом деле, можно было бы просто задать множества положительно, отрицательно и нейтрально заряженных полярны аминокислот, и отдельно неполярных аминокислот как константы

Для предыдущей функции эти константы можно было бы тоже эффективно использовать, и не множить сущности в коде

DNA_AA = {'F': 'TTY', 'L': '(TTR or CTN)', 'I': 'ATH', 'M': 'ATG', 'V': 'GTN', 'S': '(TCN or AGY)', 'P': 'CCN', 'T': 'ACN', 'A': 'GCN',
'Y': 'TAY', 'H': 'CAY', 'Q': 'CAR', 'N': 'AAY', 'K': 'AAR', 'D': 'GAY', 'E': 'GAR', 'C': 'TGY', 'W': '(CGN or AGR)', 'R': 'AGY', 'G': 'GGN'}


def define_polarity(seq: str) -> Dict[str, int]:
"""
Counts polar and nonpolar aminoacids in aminoacid sequences.

Arguments:
str: sequence to count polar and nonpolar aminoacids.

Return:
Dict[str, int]:
Dictionary with keys 'Polar', 'Nonpolar' and values of quantity of according groups in sequence.
"""
polarity_count = {'Polar': 0, 'Nonpolar': 0}
for aminoacid in seq:
if aminoacid in POLAR_AA:
polarity_count['Polar'] += 1
else:
polarity_count['Nonpolar'] += 1
return polarity_count
Comment on lines +90 to +96

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥



def to_dna(seq: str) -> str:
"""
Transforms aminoacid sequence to DNA sequence

Arguments
---------
str: aminoacid sequence to transform to DNA sequence.

Return
------
str: according DNA sequence.
"""
sequence_dna = []
for aminoacid in seq:
sequence_dna.append(DNA_AA[aminoacid])
return ''.join(sequence_dna)


#Margarita
ABBREVIATION_THREE_TO_ONE = {'ALA':'A', 'CYS':'C', 'ASP':'D', 'GLU':'E', 'PHE':'F',
'GLY':'G', 'HIS':'H', 'ILE':'I', 'LYS':'K', 'LEU':'L',
'MET':'M', 'ASN':'N', 'PRO':'P', 'GLN':'Q', 'ARG':'R',
'SER':'S', 'TRE':'T', 'VAL':'V', 'TRP':'W', 'TYR':'Y'}
AMINO_ACIDS_ONE_LETTER = {'A', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

А можно было бы объединить сеты :)

Suggested change
AMINO_ACIDS_ONE_LETTER = {'A', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K',
AMINO_ACIDS_ONE_LETTER = POLAR_AA.union(NONPOLAR_AA)

'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V',
'W', 'Y'}
AMINO_ACIDS_THREE_LETTER = {'ALA', 'CYS', 'ASP', 'GLU', 'PHE',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ну или же для получения этого множества и множества выше, взять отдельно ключи и значенися словаря ABBREVIATION_THREE_TO_ONE

Зачем вводить все вручную, если можно накодить?)

'GLY', 'HIS', 'ILE', 'LYS', 'LEU',
'MET', 'ASN', 'PRO', 'GLN', 'ARG',
'SER', 'TRE', 'VAL', 'TRP', 'TYR'}

import sys

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Все импорты должны быть вверху


def change_abbreviation(seq: str) -> str:
"""
Changes the amino acid abbreviation from three-letter to one-letter.

Parametrs
----------
seq : str
Amino acid sequence in three-letter form.
Returns
-------
str
Amino acid sequence in one-letter form

"""
one_letter_seq = [ABBREVIATION_THREE_TO_ONE[amino_acid] for amino_acid in seq.split("-")]
return "".join(one_letter_seq)

def is_correct_seq(seq: str) -> bool:
"""
Check the sequence for extraneous characters.

Parametrs
----------
seq : str
Amino acid sequence.
Returns
-------
bool
TRUE - if there is no extraneous characters, FALSE - if there is extraneous characters.

"""
unique_amino_acids = set(seq)
unique_amino_acids_three = set(seq.split("-"))
check = unique_amino_acids <= AMINO_ACIDS_ONE_LETTER or unique_amino_acids_three <= AMINO_ACIDS_THREE_LETTER
return check
Comment on lines +163 to +166

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Проблема с отступами;
советую использовать методы множеств

Suggested change
unique_amino_acids = set(seq)
unique_amino_acids_three = set(seq.split("-"))
check = unique_amino_acids <= AMINO_ACIDS_ONE_LETTER or unique_amino_acids_three <= AMINO_ACIDS_THREE_LETTER
return check
unique_amino_acids = set(seq)
unique_amino_acids_three = set(seq.split("-"))
check = unique_amino_acids.issubset(AMINO_ACIDS_ONE_LETTER) or unique_amino_acids.issubset(unique_amino_acids_three) <= AMINO_ACIDS_THREE_LETTER
return check


def protein_tool(*args: str) -> Union[str, List[Union[Dict[str, int], str]]]:
"""
Receives a request from the user and runs the desired function.

Parametrs
----------
seq : str
Amino acid sequences.
operation : str
Type of user's request.
Returns
-------
str
If a single sequence is supplied, outputs the result as a string or or identify a problem with a specific sequence.
list
If several sequences are supplied, outputs the result as a list.

"""
*seqs, operation = args

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Операцию над последовательностью в идеале можно сделать именованным аргументом, и тогда можно обойтись без усложнения в виде распаковки)

operations = {'one letter':change_abbreviation, 'RNA':to_rna, 'DNA':to_dna, 'charge':define_charge, 'polarity':define_polarity}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Тоже стоит вынести как константу

(отступы грустят)

output = []
for seq in seqs:
answer = is_correct_seq(seq.upper())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Нейминг: скорее это не ответ, а проверка, лучше назвать переменную input_check

if answer:
function_output = operations[operation](seq.upper())
output.append(function_output)
else:
print(f'Something wrong with {seq}', file=sys.stderr)
continue
if len(output) == 1 and (operation == 'RNA' or operation == 'DNA' or operation == 'one letter'):
return ''.join(output)
else:
return output
Binary file added HW4_Zolotikov/team-HW4.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading