Bioinformatics questions sequence
1
0
Entering edit mode
3.8 years ago
anapaolavi • 0
YKYRYLRHGKLRPFERDI
YKYRYLKHGKLRPFERDI
YKYRYLXHGKLRPFERDI
YKYRSLRHGKLRPFERDI
YKYRCLRHGKLRPFERDI
YKFRYLRHGKLRPFERDI
YKHRYLRHGKLRPFERDI
YKXRYLRHGKLRPFERDI
YLYRWVRRSKLNPYERDL
FYYRLFRHGKIKPYERDI
FFYRRFRHGKIKPYGRDL
FYYRLFRHGKIKPYGRDL
YYYRIWRSEKLRPFERDI
YYYRSHRKTKLKPFERDL
YFYRSHRSTKLKPFERDL
YFYRSHRSSKLKPFERDL
YYYRSSRKTKLKPFERDL
YYYRSYRKEKLKPFERDL

Write a regular expression that describes the alignment in the box.

Find 5 protein sequences from different organisms or strains that contain the pattern described by the regular expression from Q1. List the ID, name, size, source, and function of each protein.

Find 2 proteins with known structures that contain the pattern described by the regular expression from Q1. List the IDs of found protein structures.

Build a multiple sequence alignment for all protein sequences from Q2 and Q3.

Identify the conserved regions in the alignment from Q4 and explore their biological significance.

Evaluate statistical parameters of the regular expression from Q1 based on similar expressions in the Prosite database.

sequence • 1.3k views
ADD COMMENT
0
Entering edit mode

please change your title "Bioinformatics questions sequence". Of course it is a question about bioinformatics...

ADD REPLY
0
Entering edit mode

looks like a homework. what have you tried so far ?

ADD REPLY
0
Entering edit mode

not sure where to start

ADD REPLY
0
Entering edit mode

The first question is asking to write a regular expression that captures those sequences. Depending on what language you are writing this in there will be regex tutorials that you should go through.

ADD REPLY
0
Entering edit mode

would this be correct? regex = ([A-Z])+

ADD REPLY
0
Entering edit mode

yes but looks like it's a amino-acid alphabet (not A to Z) with a specific length...

ADD REPLY
0
Entering edit mode

what would you recommend then?

ADD REPLY
0
Entering edit mode

would this be correct? regex = ([A-Z])+

Well, yes, but it also covers the sequences A and AA and AAA and every other sequence of alphabetical uppercase characters that is conceivable (including all sequences that contain non-amino acid letters).

You need to find one that covers (exactly) the given alignment. So best to look at the individual columns of the alignment and see what amino acids they're composed of. This should then give you an idea of how to build the regex.

ADD REPLY
2
Entering edit mode
3.8 years ago
Mensur Dlakic ★ 28k

This is clearly a homework assignment, and you should ask your instructor for details. It beats the educational goals of your instructor if we show you exactly how to do this. That said, here are couple of hints.

I am guessing that a regular expression assignment is about individual columns in your alignment rather than a full set of sequences. For example, this is a regular expression of the last 4 columns in your alignment:

[EG]-R-D-[IL]

This means that the last column is either I of L, next to the last is always D, the one before it always R, and the one before it is either E or G. You should check with your instructor, but I think that your assignment is to find this pattern across all columns, and then search the database for proteins that match the pattern you found.

For example, here is one protein that matches the whole pattern (the match is in red):

MFIFLLFLTLTSGSDLDRCTTFDDVQAPNYTQHTSSMRGVYYPDEIFRSDTLYLTQDLFLPFYSNV TGFHTINHTFGNPVIPFKDGIYFAATEKSNVVRGWVFGSTMNNKSQSVIIINNSTNVVIRACNFEL CDNPFFAVSKPMGTQTHTMIFDNAFNCTFEYISDAFSLDVSEKSGNFKHLREFVFKNKDGFLYVYK GYQPIDVVRDLPSGFNTLKPIFKLPLGINITNFRAILTAFSPAQDIWGTSAAAYFVGYLKPTTFML KYDENGTITDAVDCSQNPLAELKCSVKSFEIDKGIYQTSNFRVVPSGDVVRFPNITNLCPFGEVFN ATKFPSVYAWERKKISNCVADYSVLYNSTFFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQ IAPGQTGVIADYNYKLPDDFMGCVLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFERDISNVPFSP DGKPCTPPALNCYWPLNDYGFYTTTGIGYQPYRVVVLSFELLNAPATVCGPKLSTDLIKNQCVNFN FNGLTGTGVLTPSSKRFQPFQQFGRDVSDFTDSVRDPKTSEILDISPCSFGGVSVITPGTNASSEV AVLYQDVNCTDVSTAIHADQLTPAWRIYSTGNNVFQTQAGCLIGAEHVDTSYECDIPIGAGICASY HTVSLLRSTSQKSIVAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVMPVSMAKTSVDCNMYICG DSTECANLLLQYGSFCTQLNRALSGIAAEQDRNTREVFAQVKQMYKTPTLKYFGGFNFSQILPDPL KPTKRSFIEDLLFNKVTLADAGFMKQYGECLGDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAA LVSGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTS TALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYV TQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERN FTTAPAICHEGKAYFPREGVFVFNGTSWFITQRNFFSPQIITTDNTFVSGNCDVVIGIINNTVYDP LQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKY EQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACSCGSCCKFDEDDSEPVLKGVKLHY T

ADD COMMENT
0
Entering edit mode

Would this be correct for the regular expression for the whole alignment? [YF][KLYF][YFHX]R[YWSCRLI][LYWFVHS][RKX][HRKS][GSTE]K[LI][RNK][P][FY][EG]RD[LI]

ADD REPLY
0
Entering edit mode

I only checked the first three brackets, but those look good.

ADD REPLY
0
Entering edit mode

Thank you for checking! My next question is how to find 5 protein sequences from different organisms or strains that contain the pattern described by the regular expression that I provided above . I have to list the ID, name, size, source, and function of each protein. How can I do that?

ADD REPLY
0
Entering edit mode

If you are doing this on the linux command line you can use grep with the regex. If you are using a programming language like Python or R they have functions to search strings using regex. Refer to the documentation for those languages for more information.

ADD REPLY

Login before adding your answer.

Traffic: 2331 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6