Hi everyone,
I am sure this is not a very complicated problem but for some reason I cannot find a straight answer ( and/or I miss to see the logic in some of the answers I saw on other's people post). As the title says, I am trying to find a way to open a multi-fasta file, extract the first n bases of each read and "write" all of that to a new fasta file. Due to my inexperience I assumed it would be as easy as:
import Bio
reads = SeqIO.parse(file,"fasta")
end = []
n = 50
for record in reads:
record[:n].append(end)
SeqIO.write(end, "end.fasta", "fasta")
But obviously there is several things wrong with my code and I don't really know how to fix it.
It seems I would need to construct a SeqRecord
object with the new sliced read and then pass that to SeqIO.write
but I'm not sure how to manage to have all the fasta reads written in the same file at once and not having each record write over the last one.. any help/indication would be greatly appreciated ! Thanks in advance.
Thanks for your reply.
I don't know if it is a feature of jupyter notebook or not but just writing
import Bio
works as good as what you wrote. Though I would agree just importing SeqIO would be faster than importing the whole biopython package ?I made several tests by printing
record[:n]
and it was effectively printing the first n bases of the sequence of the record. My description was a bit misleading though, I'm sorry.. I wanted to make my question simpler to understand but what I actually want is to take the first n bases and write them to a file and do the same for the last n bases. So in my first code I had two list, one calledend5
(for the first n bases) and one calledend3
(for the last n bases).That's probably the main reason what it didn't seem to work..
I've got a file of 1.5million sequences to slice so writting everything to a list was definitely not a good idea but I had no idea how to do without it, so thanks a lot ! I'm still learning python by myself so I make rookie mistakes ..
I eventually figured it would be possible to do that but I must admit I am surprised there is no way to write a multifasta file at once.
Thanks again for your help, I will try that and will come back to accept your answer when I have it working !
About importing Bio: Remarkable, at least in my python interpreter I get the following:
Anyway, that's not the main problem here.
You are absolutely right with my second comment, I messed up there.
At least now I understand better why I see
from x import y
in most of the code posted online whenimport x
works totally fine for me. It's good to keep it in mind though.As for my problem here, your code helped me find a better suited solution which I post here in case anyone might need it:
Thanks again for the time you dedicated me, I really appreciate it !