Augustus output, fasta with original contig name

0

Entering edit mode

4.9 years ago

belliardocaro • 0

Hello,

I need to predict genes in several thousand files and then analyses predicted proteins.

First, I have used Augustus and getAnnoFasta.pl to have a fasta file of proteins. The getAnnoFasta.pl give me a file with protein names like

>g1.t1 
>g2.t1
>g3.t1
..

But, I need to keep DNA contig names in my protein sequence names like

>dnacontig1.g1
>dnacontig1.g2
>dnacontig2.g1

or

>g1.dnacontig1
>g2.dnacontig1
>g1.dnacontig2

Don't matter the format, I just need to have the original contig name in the protein sequence name with the quickest method.

I think to used bedtools to extract my sequences in original files then translate sequences. Or, I think to make my homemade python script to extract sequences from Augustus outputs.

What is the best way? Thanks for your help.

gene • 1.1k views

ADD COMMENT • link updated 2.2 years ago by Ram 44k • written 4.9 years ago by belliardocaro • 0