I have a couple hundred DNA sequences to translate to protein. I used transeq from EMBOSS which is quite simple except that I was not able to get the translated orf with just the start amino acid (methionine) and stop aa. In this example below, >comp2_seq1_2
is the best orf i want to select. How do i set parameters in transeq so that I only get MEIKDLADLYGDELKLTKLIRKSSA
RAQEIAKRQELDSSDGQIIDDHDQFYKDHKLLLLLFRILGVMPIERGKIGRITFSWKSIP
MIYAYVFYAVMTVIVVFVGIERVDILLNKSKKFDEYIYSIIFIIFLVPHFWIPFVQKDID
NFCTGYIIAHYRRLWLELSELLQSIGNAYARTYSTYSLFMITNITVATYGFISEIMEHGI
TFSFKEMGLIVASAYCMVLLYIYCDCSHKASDNIALRIQRSLIEIDLTTINLDTGKEIDM
FLTAIRLNPPTVSLQGYSDVDRKLITSSVSTIAIYLIVLLQFKISLLNMKSIE
from this orf (>comp2_seq1_2
). I was able to get the preferred coding region translated from sixpack, but the problem with sixpack is that it only translates one sequence at a time (Please also correct me if that is not the case). Here is the DNA contig I used to translate all these six frames:
>comp2_seq1
GTAGGGTCGAGTGGCCAGCTCTGCCGATTTCAACAGGGCTAGGGGTAGGTTTATGTTTTTGTCGGTGCTAATGGTATAGC
TTTTGGGTTGAAAAATTATATTCATCATGGAAATAAAAGACCTGGCAGATTTATATGGCGACGAACTTAAATTAACAAAA
CTGATCAGAAAAAGCTCGGCACGTGCTCAGGAAATTGCTAAAAGACAAGAATTGGATTCTTCCGATGGACAAATCATCGA
TGACCATGATCAATTTTACAAAGATCACAAGTTGCTTCTTCTATTATTTAGAATACTGGGTGTGATGCCCATCGAACGTG
GAAAAATTGGAAGAATAACTTTTAGCTGGAAAAGCATTCCGATGATCTACGCATACGTCTTTTATGCTGTCATGACAGTT
ATAGTCGTCTTTGTGGGGATTGAAAGAGTCGACATATTGCTGAACAAGAGCAAAAAGTTTGACGAATATATCTACTCCAT
TATCTTCATTATTTTCTTGGTACCGCACTTCTGGATACCGTTCGTCCAGAAGGACATAGATAACTTTTGCACCGGATACATAATAGCC
CACTATAGAAGACTATGGCTAGAACTAAGCGAGCTCCTCCAGTCTATAGGAAATGCTTACGCAAGAACGTATTCTACGTA
TTCGCTGTTTATGATCACCAACATCACAGTTGCGACGTACGGCTTTATATCAGAAATCATGGAGCACGGGATAACGTTTT
CTTTCAAAGAAATGGGCCTTATTGTAGCCAGCGCGTATTGCATGGTGCTTCTGTACATCTACTGCGATTGCTCACATAAA
GCCTCAGATAATATAGCTCTGAGGATCCAGAGATCGCTAATAGAAATTGATCTAACTACGATTAATCTAGACACAGGAAA
AGAGATTGATATGTTTTTGACAGCAATTCGTCTAAATCCTCCAACAGTGTCTTTACAAGGCTATTCTGATGTTGATAGAA
AACTTATAACTTCAAGTGTTTCCACCATAGCGATCTACCTAATTGTCCTGCTACAATTCAAGATAAGTTTACTCAACATG
AAATCTATAGAATAAAGCTTAAATGATATATTTCTAGATTAAAATGCTAGATTATAGATTAAAATAAGTATGTAGGCACA
AGTTAAATGTTATTTTTGTTACAGGTTGATCTAATAAAGTTATCAACATAGCAATTCGAACGTTACAGCTAGCGCGGACA
CATGTCACATGGTTTTTGATTTACTCGATCTGTCTTCTATAAT
Any help would be appreciated. Thanks!
Here are the translated six frames:
>comp2_seq1_1
VGSSGQLCRFQQG*G*VYVFVGANGIAFGLKNYIHHGNKRPGRFIWRRT*INKTDQKKLG
TCSGNC*KTRIGFFRWTNHR*P*SILQRSQVASSII*NTGCDAHRTWKNWKNNF*LEKHS
DDLRIRLLCCHDSYSRLCGD*KSRHIAEQEQKV*RIYLLHYLHYFLGTALLDTVRPEGHR
*LLHRIHNSPL*KTMARTKRAPPVYRKCLRKNVFYVFAVYDHQHHSCDVRLYIRNHGARD
NVFFQRNGPYCSQRVLHGASVHLLRLLT*SLR*YSSEDPEIANRN*SNYD*SRHRKRD*Y
VFDSNSSKSSNSVFTRLF*C**KTYNFKCFHHSDLPNCPATIQDKFTQHEIYRIKLK*YI
SRLKC*IID*NKYVGTS*MLFLLQVDLIKLST*QFERYS*RGHMSHGF*FTRSVFYN
>comp2_seq1_2
*GRVASSADFNRARGRFMFLSVLMV*LLG*KIIFIMEIKDLADLYGDELKLTKLIRKSSA
RAQEIAKRQELDSSDGQIIDDHDQFYKDHKLLLLLFRILGVMPIERGKIGRITFSWKSIP
MIYAYVFYAVMTVIVVFVGIERVDILLNKSKKFDEYIYSIIFIIFLVPHFWIPFVQKDID
NFCTGYIIAHYRRLWLELSELLQSIGNAYARTYSTYSLFMITNITVATYGFISEIMEHGI
TFSFKEMGLIVASAYCMVLLYIYCDCSHKASDNIALRIQRSLIEIDLTTINLDTGKEIDM
FLTAIRLNPPTVSLQGYSDVDRKLITSSVSTIAIYLIVLLQFKISLLNMKSIE*SLNDIF
LD*NARL*IKISM*AQVKCYFCYRLI**SYQHSNSNVTASADTCHMVFDLLDLSSIX
>comp2_seq1_3
RVEWPALPISTGLGVGLCFCRC*WYSFWVEKLYSSWK*KTWQIYMATNLN*QN*SEKARH
VLRKLLKDKNWILPMDKSSMTMINFTKITSCFFYYLEYWV*CPSNVEKLEE*LLAGKAFR
*STHTSFMLS*QL*SSLWGLKESTYC*TRAKSLTNISTPLSSLFSWYRTSGYRSSRRT*I
TFAPDT**PTIEDYG*N*ASSSSL*EMLTQERILRIRCL*SPTSQLRRTALYQKSWSTG*
RFLSKKWALL*PARIAWCFCTSTAIAHIKPQII*L*GSRDR**KLI*LRLI*TQEKRLIC
F*QQFV*ILQQCLYKAILMLIENL*LQVFPP*RST*LSCYNSR*VYST*NL*NKA*MIYF
*IKMLDYRLK*VCRHKLNVIFVTG*SNKVINIAIRTLQLARTHVTWFLIYSICLL*X
>comp2_seq1_4
IIEDRSSKSKTM*HVSALAVTFELLC**LY*INL*QK*HLTCAYILILIYNLAF*SRNIS
FKLYSIDFMLSKLILNCSRTIR*IAMVETLEVISFLSTSE*PCKDTVGGFRRIAVKNISI
SFPVSRLIVVRSISISDLWILRAILSEALCEQSQ*MYRSTMQYALATIRPISLKENVIPC
SMISDIKPYVATVMLVIINSEYVEYVLA*AFPIDWRSSLSSSHSLL*WAIMYPVQKLSMS
FWTNGIQKCGTKKIMKIME*IYSSNFLLLFSNMSTLSIPTKTTITVMTA*KTYA*IIGML
FQLKVILPIFPRSMGITPSILNNRRSNL*SL*N*SWSSMICPSEESNSCLLAIS*ARAEL
FLISFVNLSSSPYKSARSFISMMNIIFQPKSYTISTDKNINLPLALLKSAELATRPY
>comp2_seq1_5
YRRQIE*IKNHVTCVRASCNVRIAMLITLLDQPVTKITFNLCLHTYFNL*SSILI*KYII
*ALFYRFHVE*TYLEL*QDN*VDRYGGNT*SYKFSINIRIAL*RHCWRI*TNCCQKHINL
FSCV*INRS*INFY*RSLDPQSYII*GFM*AIAVDVQKHHAIRAGYNKAHFFERKRYPVL
HDF*YKAVRRNCDVGDHKQRIRRIRSCVSISYRLEELA*F*P*SSIVGYYVSGAKVIYVL
LDERYPEVRYQENNEDNGVDIFVKLFALVQQYVDSFNPHKDDYNCHDSIKDVCVDHRNAF
PAKSYSSNFSTFDGHHTQYSK**KKQLVIFVKLIMVIDDLSIGRIQFLSFSNFLSTCRAF
SDQFC*FKFVAI*ICQVFYFHDEYNFSTQKLYH*HRQKHKPTPSPVEIGRAGHSTLX
>comp2_seq1_6
L*KTDRVNQKPCDMCPR*L*RSNCYVDNFIRSTCNKNNI*LVPTYLF*SII*HFNLEIYH
LSFIL*ISC*VNLS*IVAGQLGRSLWWKHLKL*VFYQHQNSLVKTLLEDLDELLSKTYQS
LFLCLD*S*LDQFLLAISGSSELYYLRLYVSNRSRCTEAPCNTRWLQ*GPFL*KKTLSRA
P*FLI*SRTSQL*CW*S*TANT*NTFLRKHFL*TGGARLVLAIVFYSGLLCIRCKSYLCP
SGRTVSRSAVPRK**R*WSRYIRQTFCSCSAICRLFQSPQRRL*LS*QHKRRMRRSSECF
SS*KLFFQFFHVRWASHPVF*IIEEATCDLCKIDHGHR*FVHRKNPILVF*QFPEHVPSF
F*SVLLI*VRRHINLPGLLFP**I*FFNPKAIPLAPTKT*TYP*PC*NRQSWPLDPT
I think we need to see the DNA sequence of comp2_seq1_2 to answer this?
Thanks for replying. I have revised the question with more details and the DNA sequence I used to translate.