Hello,
I have a fasta file with different amino acid sequences, for example:
>abc
HSTSDSAQTMFPVALLLLAAGSCVKGEQLTQPTSVTVQPGQRLTITCQVSYSLGTYFTAW
IRQPAGKGLEWIGMRSTGASYYKDSLKNKFSIDLDTSSKTVTLNGQNVQPEDTAVYYCAR
APSRGFDYWGKGTMVTITSATPKGPTVFPL
>def
TARQIQHKPCFL*LCCCWQLDHV*RVNS*HSRPL*LCSQVNV*PSPVRSLILLVPTSQLG
SDSLQEKDWSGLE*DLLELHTTKIH*RTSSVST*TLPAKL*L*MDRMCSLKTLLCITVPE
RPVGVLTTGGKAPWSPSPRPPQRDQLCFL*
>ghi
GSQHVRFSTNHVSCSSAAVGSWIMCEG*TVDTADLCDCAARSTSDHHLSGLLFSW*LLHS
LDQTACRKRTGVDWEQIYWSCILQRFIKEQVQYRLRHFQQNCDSKWTECAA*RHCCVLLC
QTTGSGSWLLGERHHGHHHLGHPKGTNCVSS
and I want to filter out the sequences that are "productive" from the "non-productive" ones.
Additional info: I had translated every DNA sequence to amino acid sequence in all 6 frames.
By "non-productive" I mean those that don't translate into proteins (don't have the amino acid M and/or have too many stop codons). I would like to filter out these non-productive sequences in a fasta file.
As for the "productive" ones, I would also like to save every "productive" sequence only with the complete frame in another fasta file.
Is there any software tool where I can do this? If there isn't, I'm trying to do it in python... but I'm stuck... Any ideas you can come up with are welcome.
Thank you in advance
Please do not delete posts once they have at least one comment or an answer.
At beginning of sequence?
in general, around all the sequence