Hi,
Sorry if this questions seems obvious, I am pretty new at python. So I need some help to complete this code. I have a fasta file that looks like this:
>gene_name1|other
AAAAAAAATTTTTA
>gene_name2|other
TTTTTGGGGGAAA
>|gene_name3
TTTTTTTCCCCCCC
>|gene_name4
AAAAAATTTTTTTCC
....
Ideally I want to remove | whenever it appears at the beginning of the ID, not anywhere else. So I wrote a python code that does that, but I cannot get the output I want into a file. I can however get the output and copy/paste and arrange it myself. But I would like to find a solution on python.
My code so far:
from Bio import SeqIO
original_file = "sequences.fasta"
corrected_file = "./corrected.fasta"
with open(original_file) as original, open(corrected_file, 'w') as corrected:
records = SeqIO.parse(original_file, 'fasta')
for record in records:
if record.id[0]) == "|":
print (">",record.id[1:], "\n",
record.seq)
else:
print(">",record.id, "\n", record.seq)
result:
>gene_name1|other
AAAAAAAATTTTTA
>gene_name2|other
TTTTTGGGGGAAA
>gene_name3
TTTTTTTCCCCCCC
>gene_name4
AAAAAATTTTTTTCC
Can anyone help me to correct the code and print the output onto a fasta file. Thanks !
or not using python :
sed 's/^>|/>/' input.fa
Love it! But I am trying to learn python. Still super appreciated.
Check how to write to file in python https://www.guru99.com/reading-and-writing-files-in-python.html
You can use a list to collect all the corrected lines first, then write them into a file.
No need to store the reads, that's only going to use memory