here is my question:
I've got a file which looks like this:
103L Sequence: MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNSLDAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL Disorder: ----------------------------------XXXXXX-----------------------------------------------------------------------------------------------------------------------------XX
It contains name, which in this case is 103L; protein sequence, which has "Sequence:" label; disorder region, which is after "Disorder:". the "-" represent that this position is ordered, and "X" represent that this particular position is disordered. For example, that last two "XX" under disorder represent that the last two position of the protein sequence is disordered, which is "NL". After I use split method, it looks like this:
['>103L', 'Sequence:', 'MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNSLDAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL', 'Disorder:', '----------------------------------XXXXXX-----------------------------------------------------------------------------------------------------------------------------XX']
I want to use python to find the disorder sequence and its position. So the final file should look somewhat like this: Name Sequence: 'real sequence' Disorder: position(Posi) residue-name(R) Take 103L as an example:
103L Sequence: MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNSLDAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL
Disorder: Posi R
34 K
35 S
36 P
37 S
38 L
39 N
65 N
66 L
I am new in python, really hope someone can help me, thank you so much!!!