Hello, I'm sorry if my question is very basic, the truth is that until entering the PHD I had not had contact with programming languages other than HTML haha. I have a FASTA file that I create with the help of EMBOSS by translating the 6 reading frames of another nucleotide fasta file of a transncriptome, but now I need to filter those six frames to know which is the optimal one. I'm researching how to iterate in Python (yes, seriously, I'm just getting started on this) to see if that can help. Does anyone know if I am doing well or if there is any other more effective method? I'd appreciate any advice you can give me no matter what language or packages I need to install, I'm a bit desperate hahaha. Thank you and sorry for the inconvenience.
What do you mean "optimal"?
Wow, I didn't think I would get answers so fast, thank you very much !!! By "Optimal" I meant the longest sequence
if with 'optimal' you mean the frame which leads to a protein sequence you will need to run some kind of ORF finder on it. (or take the one with the longest ORF if it may be a bit crude)
For ORF-finding, OP should try Borf if they have stranded data, but one can always use TransDecoder.
some examples indeed. many more are around as well: FrameD, est2orf, ORFfinder, .... they all likely perform somewhat equally (some might have extra features such as frame-shift correction (FrameD) ... )
Oh, thank you very much lieven, I going to check thos too
Oooh, thank you thank you thank you, I'm already reading the documentation for the programs you recommended. Borf looks promising. Another noob question that is still not clear to me, sorry. Does this type of program translate the 6 reading frames and the ORF that they return is the longest of those or does it only translate the first frame and that's it?
depends a bit on the software used.
In most cases they will report the longest ORF (and would have thus evaluated all possible 6 frames indeed) as that one is likely the "correct protein". This is however not always correct and that is why some programs will use something called 'coding potential' and will thus look for the reading frame with a substantial ORF and has a high coding potential and will then report that one (which is not necessarily the longest).
If you want the longest whatsoever do make sure you use a program that does that (longestORF for instance)
I have to dive deep in that documentation. Thank you so much Lieven, really appreciate your help :)
Again, I didn't think I would get answers so fast, for real, thank you very much!!! Yes, I was referring to the longest sequence, after searching for a while more I also realized that, as you say, there were ORF search packages