How to change Seqrecord id to numbers from a .txt file (Biopython)
2
0
Entering edit mode
8.5 years ago

Hi,

I am new to Biopython and programming in general. I am having difficulty generating viable scripts to convert the id in my pasta files to consecutive numbers. I have a text file containing:

1 2 3 4 etc...

and my code currently reads:

from Bio import SeqIO

lines_file = open("Numbers1_50.txt")
fout = open("output.fasta", "w")
handle = open("input.fasta", "r")

for seq_record in SeqIO.parse(handle, "fasta"):
    seq_record.id = lines_file[0]
    seq_record.description = ""
    print seq_record.id)
    SeqIO.write(fout, lines_file[0]+".fasta","fasta")

lines_file.close()
fout.close()

However, I keep getting an error at the line containing: seq_record.id = lines_file[0]

TypeError: '_io.TextIOWrapper' object is not subscriptable

If anyone has a moment to explain my error, I would greatly appreciate it! Thanks.

sequencing biopython fasta • 4.3k views
ADD COMMENT
1
Entering edit mode
8.5 years ago

You'll need to read the open file into a list before you try to access individual elements:

from Bio import SeqIO

lines_file = open("Numbers1_50.txt").readlines()

However there is probably a better way to do this:

from Bio import SeqIO

with open("input.fasta", "r") as handle, open("output.fasta", "w") as fout:
  for i, seq_record in enumerate(SeqIO.parse(handle, "fasta")):
    seq_record.id = str(i + 1)
    seq_record.description = ""
    SeqIO.write(seq_record, fout, "fasta")
ADD COMMENT
0
Entering edit mode

Thank you for your help! I tried enumerate the id's as you suggested however I received the following error:

return text.replace("\n", " ").replace("\r", " ").replace(" ", " ") AttributeError: 'int' object has no attribute 'replace'

I am not sure what happened here.

ADD REPLY
1
Entering edit mode

Edited. I should have coerced the integer to a string.

ADD REPLY
0
Entering edit mode
8.5 years ago

Try this:

from Bio import SeqIO


fout = open("output.fasta", "w")
handle = open("input.fasta", "r")
new_id=0

for seq_record in SeqIO.parse(handle, "fasta"):
    seq_record.id = new_id
    new_id =+ 1
    seq_record.description = ""
    print seq_record.id)
    SeqIO.write(fout, lines_file[0]+".fasta","fasta")

fout.close()

It looks like you're currently trying to read the numbers from a file. When you take your file (in your case lines_file) and ask it for subscript 0 (i.e. [0]) you're really asking for the 0th character, not the first row. Python doesn't handle this (at least in a way I'm familiar with but then again I'm a C++ guy primarily).

The code above solves the problem by starting a variable at 0 and incrementing by 1 for each sequence record and then assigning it to the seq_record.id within the loop you already constructed. This doesn't solve the case if you wanted to extract out IDs from a file and transfer them in, which might be what you're really after.

If you're trying to do that, then you'll need to open the file (like you're currently doing) and instead of trying to access [0] in the file, you probably want to use the readline() method. You can try code like this:

seq_record.id=lines_file.readline()

Alternatively you can use:

seq_record.id=lines_file.readline().strip()

which will also remove the newline (I'm not sure if you want it to or not in this case). You would substitute one of these two lines for the line where you're currently saying:

seq_record.id = lines_file[0]

in your code.

I believe that should solve your issue, but I'm happy to help further if it doesn't quite get you there.

ADD COMMENT
0
Entering edit mode

Thank you! I used seq_record.id=lines_file.readline() and it worked! My final code read:

for seq_record in SeqIO.parse(handle, "fasta"):
    seq_record.id = lines_file.readline()
    seq_record.description = ""
    SeqIO.write(seq_record, fout, "fasta")
    print(seq_record)
ADD REPLY
1
Entering edit mode

Great! I'm glad it worked for you especially since I don't use python much :D

ADD REPLY

Login before adding your answer.

Traffic: 2268 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6