How can I translate multiple (more than 25000) DNA sequences with different frames to protein sequence? Is there any program or perl script I can use to do that? I am also not sure if can can include all the sequences with their frame for translation all at the same time. Please share any information on this. Thanks!
Thanks guys! @Pavel, I think it allows me to submit same frame for multiple sequences, but how do I include multiple sequences with different frames all as one batch submission? Also, Is there a way to omit the sequences with stop codons in the frames for translation?? @Biolab, so the script you mentioned only works for frame 1? How do I translate other frames, ? Sorry I am novice in perl.. Thanks a bunch!
These are good questions, you need to do some work for that - extract/organize sequences, compose proper command lines etc. For sixpack you'll need to pre-process/split your dataset -
sixpack
will extract all of the possible ORFs from a single sequence and allows customization of that process,transeq
will just translate the whole batch placing stops*
so you'll need to do post-processing. If ORFs positions are known, thentranseq
can take in the coordinates and translate.Hi Youwanpras, I am also a perl beginner. I write a script as follows. It works, but you'd better test yourself. You need to pay attention that each sequence should be in single line (not sure how to improve it). My script is not consice, it will be helpful to ask others in BIOSTARS, as many experts are here. Hope it helps!
I am trying to translate DNA sequence with your code but it is giving following warning messages
Could you please help to fix it.