Hi Everyone, I am trying to figure out the easiest way to extract the longest sequence from a translated frame of DNA sequences. The sequences I have were translated using transeq, now I need to extract the region with the longest gap between two stop codons (as indicated by * in the sequence below. The translated sequence look like this:
>Hh_TY_4
KSHLATLH*LSQCH*NSMSISAF*VIYVYFE*S*HTFRIPFVKTF**FK**NSSIHNEKFVLFP**KDISSRWKSHSCCWRRERPKQFVCIWRWYSICQTSTRVRSRNNQENPVRN*HLQSQ*SEESQRCR
>Hh_TY_5
KSHLATLH*LSQCH*NSMSISAF*VIYVYFE*S*HTFRIPFVKTFFKNSSIHNEKFVLFPKDISENPVRN*HLQSQ*SEESQRCR
And the output I want is like this:
>Hh_TY_4
KDISSRWKSHSCCWRRERPKQFVCIWRWYSICQTSTRVRSRNNQENPVRN
>Hh_TY_5
HTFRIPFVKTFFKNSSIHNEKFVLFPKDISENPVRN
Any suggestions on this would be really appreciated. Thanks!
This is cool, I learned something new. I had to read perlvar and take a long look to understand this one because it's pretty esoteric. I like seeing different ways of solving problems, and this is very useful.