Entering edit mode
23 months ago
Ayish
•
0
Hello,
I have a large fasta file containing both nucleotide and protein sequences. I need to separate the sequences into two files based on the type of sequence. Is there any Python module that can look for ?
Thanks in advance.
Does the sequence identifier lines tell you whether they're DNA or protein? Or you gonna have to guess for a peptide made out of Glycine, Alanine, Cysteine and Threonine?
Unfortunately, No. It would be guess work, I think.
Here is a snippet in python. Or you can try out the biopython module too. But be aware this guessing work can go very wrong if you have UIPAC nucleotide symbols other than
ATCG
. https://www.bioinformatics.org/sms/iupac.htmlhttps://colab.research.google.com/drive/1XSQBDoLIyQUGwUJvXRtZHkcXxsVU6oRH?usp=sharing