Question

Remove character from Fasta IDs -- python

0

Entering edit mode

5.6 years ago

Amaranta_Remedios • 0

Hi,

Sorry if this questions seems obvious, I am pretty new at python. So I need some help to complete this code. I have a fasta file that looks like this:

>gene_name1|other
AAAAAAAATTTTTA
>gene_name2|other
TTTTTGGGGGAAA
>|gene_name3
TTTTTTTCCCCCCC
>|gene_name4
AAAAAATTTTTTTCC
....

Ideally I want to remove | whenever it appears at the beginning of the ID, not anywhere else. So I wrote a python code that does that, but I cannot get the output I want into a file. I can however get the output and copy/paste and arrange it myself. But I would like to find a solution on python.

My code so far:

from Bio import SeqIO
original_file = "sequences.fasta"
corrected_file = "./corrected.fasta"

with open(original_file) as original, open(corrected_file, 'w') as corrected:
records = SeqIO.parse(original_file, 'fasta')

for record in records:
if record.id[0]) == "|":
    print (">",record.id[1:], "\n",
           record.seq)
else:
    print(">",record.id, "\n", record.seq)

result:

>gene_name1|other
AAAAAAAATTTTTA
>gene_name2|other
TTTTTGGGGGAAA
>gene_name3
TTTTTTTCCCCCCC
>gene_name4
AAAAAATTTTTTTCC

Can anyone help me to correct the code and print the output onto a fasta file. Thanks !

python • 2.5k views

ADD COMMENT • link updated 5.6 years ago by Corentin ▴ 620 • written 5.6 years ago by Amaranta_Remedios • 0

2

Entering edit mode

or not using python : sed 's/^>|/>/' input.fa

ADD REPLY • link 5.6 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Love it! But I am trying to learn python. Still super appreciated.

ADD REPLY • link 5.6 years ago by Amaranta_Remedios • 0

1

Entering edit mode

Check how to write to file in python https://www.guru99.com/reading-and-writing-files-in-python.html

ADD REPLY • link 5.6 years ago by JC 13k

0

Entering edit mode

You can use a list to collect all the corrected lines first, then write them into a file.

ADD REPLY • link 5.6 years ago by shoujun.gu ▴ 350

0

Entering edit mode

No need to store the reads, that's only going to use memory

ADD REPLY • link 5.6 years ago by WouterDeCoster 48k

score 1 · Answer 1 · 2019-11-06

1

Entering edit mode

5.6 years ago

Eric Lim ★ 2.2k

Since you're already using SeqIO to process incoming fasta, the easiest and quickest way is to reuse record and replace print with SeqIO.write. Everything else is the same.

for record in records:
  if record.id.startswith('|'):
    record.id = record.id[1:]
    record.description = ''
  SeqIO.write(record, corrected, 'fasta')

You should still follow the link that @JC posted to learn about general reading/writing in Python outside of the biopython's ecosystem.

ADD COMMENT • link 5.6 years ago by Eric Lim ★ 2.2k

0

Entering edit mode

Thanks a lot, Perfect, I added my else statement too. In case this can help others. I am posting the final version.

    for record in records:
        if record.id.startswith('|'):
            record.id = record.id[1:]
            record.description = ''
        else:
            printrecord.id)
        SeqIO.write(record, corrected, 'fasta')

ADD REPLY • link 5.6 years ago by Amaranta_Remedios • 0

score 0 · Answer 2 · 2019-11-06

You are almost there, instead of printing the results to the screen with "print()" you need to use the filehandler you created: "corrected" to write to the file.

For this you can use the "write()" method, as described in the python documentation (https://docs.python.org/3/tutorial/inputoutput.html):

f.write(string) writes the contents of string to the file, returning the number of characters written.

With "f" being the file handler (the variable you create with the "as x")

However, with your current code, you are not using "with open()" correctly:

First, you are never using the "original" file handler, you are just using the filename (because you are using the variable "original_file" as argument of the parse() method.
Second, when using "with open()" you should write the code dealing with the files inside an indented bloc, for example:
```
with open("test.txt", "w") as f:
    f.write("this will be written in test.txt")
```

That is because when you close the indentation for the "with open" it automatically closes the filehandler, so you do not have access to the file contents anymore.