Hello,
I need to predict genes in several thousand files and then analyses predicted proteins.
First, I have used Augustus
and getAnnoFasta.pl
to have a fasta file of proteins. The getAnnoFasta.pl
give me a file with protein names like
>g1.t1
>g2.t1
>g3.t1
..
But, I need to keep DNA contig names in my protein sequence names like
>dnacontig1.g1
>dnacontig1.g2
>dnacontig2.g1
or
>g1.dnacontig1
>g2.dnacontig1
>g1.dnacontig2
Don't matter the format, I just need to have the original contig name in the protein sequence name with the quickest method.
I think to used bedtools to extract my sequences in original files then translate sequences. Or, I think to make my homemade python script to extract sequences from Augustus outputs.
What is the best way? Thanks for your help.
Hi, did you find the results, if yes kindly reply Thanks