Hello everyone,
I would like to create my gene name in header of fasta file, Could anyone help me to create python script for me please?
Original name from gene prediction tool :
unitig0_FGENESH: 7 7 exon (s) 66831 - 67787 71 aa, chain - MCIECVRVLHGFGLNAYIGGARETKRGLGCATVKPATSFFNKGKKEKGLREVFSERYYYYTPWLATPYIKL unitig0_FGENESH: 8 4 exon (s) 70846 - 72746 472 aa, chain + MHESDALDRGTDIRLSPSSTTNICDSKLPLDNESLSQASAAGQGLSEKERELFENAQWQIRSHYLTGSGVDEPGVVKLDDGAVLPLMKKRDIRKDRVSGSKEAKPIPVFPEDRVSEIEIHPDHHDLHKMNGKSNIFVLKTFRHPRDPEIAEEDFKAELLANRQLPRHERIVPLLAAFEFENEFHLIFPFAHQGDLESLWKRTKMLPKHPLPGWYSPRWLLRECLGIAEALVETHSPTLPNNGAGSTVWVPQLHADIKARNILCFQSNDQAPPLLKLADFGYSQRAGEDGALNINGGLPHTKTYRPPERDIENFVGLNYDVWCLGCLYLDFITWAIIGYKGIEDFNESRMEEKNDKYVSKARGNDAEDIFFKKLARLPRWYDASALRFHSQRTEKLYKNTAFAQRSFTFSRGVIKISCDIKISVIQVNGQCQPEFRKLLKIIENDMLVVERQKRASSSKIKSLLQEVIRPQGL unitig1_FGENESH: 201 6 exon (s) 834292 - 836282 511 aa, chain + MYSLITKTLRNEEPSEVELEVVATLVELLTIDLYNLRLSQIGHPMYANFEGVTFRGLSITLEELEEYKAILARPELAKRNFSVPLGLLSSSTDEKIMEEFSKSTHSDGKRPIQLHLTIHIHGLDPALLREYRRLYPDSVVSSICAMPVGHVSPHGEKEILLRGPFFHMISMSSGELNGRPCEKLVVVMMNANRDHGMEHASDQGAKERQRQCFLRAVSASKYEVCAAIAEKHAPWDANAYRTLQNNALQQLQDIDGIYAITDGHLAEHRAKNVATWLGGALRKSYPRYYAKKRVSWQTMIRDENWKEADKILRAEYEWKKRDWYNVGQLTDDGGEFVDDNLTLLHILATKPPPPSAETDRFECWQRLIDGACQPEVWTRRCGNEGLMERLQPRIVHALDPAALGDLELRLHKIMVEKFGQFLSEHIFHMPQLSVLTEMDIPELWIPIPLSYGGVFVTLSKPERIWGLDVVVYEVVVGQTWPKKTDCSLGIYDLARSPDLVYWSSIGDQSSA unitig1_FGENESH: 202 5 exon (s) 836941 - 841297 1319 aa, chain + MASTPFPGSQWPRETPSEDISDQRNNSFYRDRDSSRQTTWDTTDEQSSTEYKNETTSPDSVYHNGYKMWNSIWLQTQVLAGLLVLFVALFLVTILLYHFSEKNHGLSAEDATRQYGWRYGPTAFLTIILSLWVQIDFSNKILTPWQEMRQGHTTADRSVLLEYISPLMITSLWRALKNRHWAVTASGLGILLIQLATVFSTGLFVLQPTALEQDDIPVVVNSVFDGSDFHLTNTSSTIGTGPAILYYGTRVHGLDPLPGVDVSRGLVVPDFTPFTEKAMAGGTNYTAIVPGVESSLDCEYIPALTNATRTSLPWWSILSAFFVLNVTTPSCSISNIIVGQGPDHNIYHQPNATQAYQGYFGDYICDPNINYGFYELPDPSNTTLEHRIVMTMADLRFPPREPRGAGPAYIYIHNLTVAVCKAGYARADYEVTYREGIAGQTKSWTSNKLSTSSSEIPGFSSAQLGAAVHSSLDQAYLGTGGQDWVLSKQVPSFYQILSAMNNNVSIGHFMEPRNLIDSATEAFNGIATQLIYKHMMKPSNTTISGSLLYQEDRLWVRALSVGFMGAAFLLLAGLVIVLLIFRPWNVVPSDPGSIGATALILTESSALRDLLMGLGAARSSQIRHRLSSYNFRSVVSPGPRKTFTVVAIEHGQPTVHQDMLGCSPPQSEHWWVPSAVKWWFQFIAVLLPLVIIAVLEILQRLSDQNNGFVDLGPDGFASTHGLSTYVPAVVAFVVASMFASLQLAVCILAPWLALHKGSAPASRSLFLNLTNRLAPHRMFLAFKNGNLGEVLIMMATFLAAWLPILVSGLYVTIPGTTPQSVTLKQSDVFDFKLNNLFYDDHLAGTVAGLIAFDDLPYRQWTYGDLVFNQLETIDGPGNTAAGNEVPFTARLKATRPSLDCTVVPAHSTMASWDKKQTDYRSIPEDKVALNLTSSIPWMCERRNGNITSVPWFQGFALPKDGRPIYFGHASVLSWGGKVFGNRAIITDVNRPGATSFTPESVANWVGGYGCPSFAVTLGRGSAVSKASGNSTTYDFDIDVTSILCSQRMEVVDTDVTLTLPSLGVISHDTPPVPDESTARYLVNTLRNHTSQIFEFPLNNLLLTLAYGTGEIIIPTSDGEENQLDPFVQFLATVNVSSPIDSLVGRNNSQNLIDATNRLYKTYMPQAIDRNMRTKNLETEVATPDSANAKVEPIQFTTRPEFPGRLRLKQEAAPKIALQAILGFMVLCAILSRVLLKGIDKLVPHNPCSIAGRAALFADGEVSTRKLVPYGAEWRTESELSSAGVYAGWLFSLGWWESWGVYKYGVDIGWIDRGKAENQM unitig153_FGENESH: 674 5 exon (s) 2967149 - 2968758 393 aa, chain + MGVGAVKYTQVACTVCIVTGMLIGICGAKCAGKKTVARYLVEHHGFKSLHIENQAPDPIQNGISPSEASGTEASPGSHANTVEEENDANTRDLVIRPKNGAMRSLHIFESEGALLDFVTKHWRSRWVTTDIHSEAVLDALNRRPFFILISVDAPVTVRWRRHQARQKQVSRPRKGSTSFEDFVAESDAHLYAAHGGVLPLMSRAAIRLLNTSDSLAHLYATLGKLDLTNGDRLRPSWDSYFMALASLAARRCNCMKRAVGCVLVDSKRRVISTGYNGTPRNLTNCMEGGCPRCNSGDATSGVSLATCLCLHAEENALLEAGRERIRDGSVLYCNTCPCLTCSIKIVQVGISEVVYNQGYSMDGETARVFLSAGVKLRQFSPPADGLIHLEKTE unitig153_FGENESH: 675 3 exon (s) 2969361 - 2969732 65 aa, chain - MPAINTAVVARDTVHQLARRENWAQQEAGVIVVFAIVFVVGVGLISLWISKLLKKRKAKKAALGA unitig3_FGENESH: 655 3 exon (s) 2965882 - 2966129 12 aa, chain + MCSVPLLLTLLQ unitig3_FGENESH: 656 4 exon (s) 2973820 - 2977069 384 aa, chain + MQMDRDFEELKEGVKVVGAVILEVQHTVEECKGKISESGEKLEMVRVGLNDFATDMNTLVNSVEADGGSQQGPRPLLQGSLQGRIPQLENENTFLRKGVDTLQATIQNMQQKHAYELAARTSHLQKRDDRCHHQLDRQAELITNVIDTIYSVFADYKEELRLLTHVNAREHNNRAPATDEIRPLSHAAPPHNHFLHHLGGETQGNPILRPEEQPGHDSDTGSSVDFDHDFRQCLREHITEVLDWYQSTVVAAEDVKSLAQRVDHFIYIMCKYHAEQGKIPTIQDVQLGVRILPLPREILLTREGSSTIPDPGHMPIHEEETRRPASAENLAPRSASSSLTFVDEDKVGIEELDITESFVEEGTDAVSRGSDFSCSCEGLPSRFT
and I would like to convert name or rename like this :
XY000_2FG00001_00007 MCIECVRVLHGFGLNAYIGGARETKRGLGCATVKPATSFFNKGKKEKGLREVFSERYYYYTPWLATPYIKL XY000_1FG00002_00008 MHESDALDRGTDIRLSPSSTTNICDSKLPLDNESLSQASAAGQGLSEKERELFENAQWQIRSHYLTGSGVDEPGVVKLDDGAVLPLMKKRDIRKDRVSGSKEAKPIPVFPEDRVSEIEIHPDHHDLHKMNGKSNIFVLKTFRHPRDPEIAEEDFKAELLANRQLPRHERIVPLLAAFEFENEFHLIFPFAHQGDLESLWKRTKMLPKHPLPGWYSPRWLLRECLGIAEALVETHSPTLPNNGAGSTVWVPQLHADIKARNILCFQSNDQAPPLLKLADFGYSQRAGEDGALNINGGLPHTKTYRPPERDIENFVGLNYDVWCLGCLYLDFITWAIIGYKGIEDFNESRMEEKNDKYVSKARGNDAEDIFFKKLARLPRWYDASALRFHSQRTEKLYKNTAFAQRSFTFSRGVIKISCDIKISVIQVNGQCQPEFRKLLKIIENDMLVVERQKRASSSKIKSLLQEVIRPQGL XY001_1FG00003_00201 MYSLITKTLRNEEPSEVELEVVATLVELLTIDLYNLRLSQIGHPMYANFEGVTFRGLSITLEELEEYKAILARPELAKRNFSVPLGLLSSSTDEKIMEEFSKSTHSDGKRPIQLHLTIHIHGLDPALLREYRRLYPDSVVSSICAMPVGHVSPHGEKEILLRGPFFHMISMSSGELNGRPCEKLVVVMMNANRDHGMEHASDQGAKERQRQCFLRAVSASKYEVCAAIAEKHAPWDANAYRTLQNNALQQLQDIDGIYAITDGHLAEHRAKNVATWLGGALRKSYPRYYAKKRVSWQTMIRDENWKEADKILRAEYEWKKRDWYNVGQLTDDGGEFVDDNLTLLHILATKPPPPSAETDRFECWQRLIDGACQPEVWTRRCGNEGLMERLQPRIVHALDPAALGDLELRLHKIMVEKFGQFLSEHIFHMPQLSVLTEMDIPELWIPIPLSYGGVFVTLSKPERIWGLDVVVYEVVVGQTWPKKTDCSLGIYDLARSPDLVYWSSIGDQSSA XY001_1FG00004_00202 MASTPFPGSQWPRETPSEDISDQRNNSFYRDRDSSRQTTWDTTDEQSSTEYKNETTSPDSVYHNGYKMWNSIWLQTQVLAGLLVLFVALFLVTILLYHFSEKNHGLSAEDATRQYGWRYGPTAFLTIILSLWVQIDFSNKILTPWQEMRQGHTTADRSVLLEYISPLMITSLWRALKNRHWAVTASGLGILLIQLATVFSTGLFVLQPTALEQDDIPVVVNSVFDGSDFHLTNTSSTIGTGPAILYYGTRVHGLDPLPGVDVSRGLVVPDFTPFTEKAMAGGTNYTAIVPGVESSLDCEYIPALTNATRTSLPWWSILSAFFVLNVTTPSCSISNIIVGQGPDHNIYHQPNATQAYQGYFGDYICDPNINYGFYELPDPSNTTLEHRIVMTMADLRFPPREPRGAGPAYIYIHNLTVAVCKAGYARADYEVTYREGIAGQTKSWTSNKLSTSSSEIPGFSSAQLGAAVHSSLDQAYLGTGGQDWVLSKQVPSFYQILSAMNNNVSIGHFMEPRNLIDSATEAFNGIATQLIYKHMMKPSNTTISGSLLYQEDRLWVRALSVGFMGAAFLLLAGLVIVLLIFRPWNVVPSDPGSIGATALILTESSALRDLLMGLGAARSSQIRHRLSSYNFRSVVSPGPRKTFTVVAIEHGQPTVHQDMLGCSPPQSEHWWVPSAVKWWFQFIAVLLPLVIIAVLEILQRLSDQNNGFVDLGPDGFASTHGLSTYVPAVVAFVVASMFASLQLAVCILAPWLALHKGSAPASRSLFLNLTNRLAPHRMFLAFKNGNLGEVLIMMATFLAAWLPILVSGLYVTIPGTTPQSVTLKQSDVFDFKLNNLFYDDHLAGTVAGLIAFDDLPYRQWTYGDLVFNQLETIDGPGNTAAGNEVPFTARLKATRPSLDCTVVPAHSTMASWDKKQTDYRSIPEDKVALNLTSSIPWMCERRNGNITSVPWFQGFALPKDGRPIYFGHASVLSWGGKVFGNRAIITDVNRPGATSFTPESVANWVGGYGCPSFAVTLGRGSAVSKASGNSTTYDFDIDVTSILCSQRMEVVDTDVTLTLPSLGVISHDTPPVPDESTARYLVNTLRNHTSQIFEFPLNNLLLTLAYGTGEIIIPTSDGEENQLDPFVQFLATVNVSSPIDSLVGRNNSQNLIDATNRLYKTYMPQAIDRNMRTKNLETEVATPDSANAKVEPIQFTTRPEFPGRLRLKQEAAPKIALQAILGFMVLCAILSRVLLKGIDKLVPHNPCSIAGRAALFADGEVSTRKLVPYGAEWRTESELSSAGVYAGWLFSLGWWESWGVYKYGVDIGWIDRGKAENQM XY153_1FG00005_00674 MGVGAVKYTQVACTVCIVTGMLIGICGAKCAGKKTVARYLVEHHGFKSLHIENQAPDPIQNGISPSEASGTEASPGSHANTVEEENDANTRDLVIRPKNGAMRSLHIFESEGALLDFVTKHWRSRWVTTDIHSEAVLDALNRRPFFILISVDAPVTVRWRRHQARQKQVSRPRKGSTSFEDFVAESDAHLYAAHGGVLPLMSRAAIRLLNTSDSLAHLYATLGKLDLTNGDRLRPSWDSYFMALASLAARRCNCMKRAVGCVLVDSKRRVISTGYNGTPRNLTNCMEGGCPRCNSGDATSGVSLATCLCLHAEENALLEAGRERIRDGSVLYCNTCPCLTCSIKIVQVGISEVVYNQGYSMDGETARVFLSAGVKLRQFSPPADGLIHLEKTE XY153_2FG00006_00675 MPAINTAVVARDTVHQLARRENWAQQEAGVIVVFAIVFVVGVGLISLWISKLLKKRKAKKAALGA XY003_1FG00007_00655 MCSVPLLLTLLQ XY003_1FG00008_00656 MQMDRDFEELKEGVKVVGAVILEVQHTVEECKGKISESGEKLEMVRVGLNDFATDMNTLVNSVEADGGSQQGPRPLLQGSLQGRIPQLENENTFLRKGVDTLQATIQNMQQKHAYELAARTSHLQKRDDRCHHQLDRQAELITNVIDTIYSVFADYKEELRLLTHVNAREHNNRAPATDEIRPLSHAAPPHNHFLHHLGGETQGNPILRPEEQPGHDSDTGSSVDFDHDFRQCLREHITEVLDWYQSTVVAAEDVKSLAQRVDHFIYIMCKYHAEQGKIPTIQDVQLGVRILPLPREILLTREGSSTIPDPGHMPIHEEETRRPASAENLAPRSASSSLTFVDEDKVGIEELDITESFVEEGTDAVSRGSDFSCSCEGLPSRFT
Gene name in XY000_2FG00001_00007 XY = my species name 000 = name of contig (unitig1, unitig2, unitig153.....) 1/2 = strand (if strand + = 1, strand - = 2) FG = FGENESH gene prediction tool 00001 = number of total gene ( I have 12112 genes) [00000, 00001, 00002, 00003, ...., 12112] 00007 = No. of gene ( when predict gene by FGENESH tool) [7, 8 201, 202, 674, 675, 655, 656....]
If anyone know to create python script for my problem, please help me or suggestion me.?
Thank you for advance
Thank you very much 'guillaume.rbt', I will try to do this script.