Entering edit mode
11 weeks ago
eedeaguiar
•
0
Hi,
I built a de novo genome of a wild mammal and I want to extract only the protein sequences. I am running Augustus like this:
augustus --gff3=on --protein=on --codingseq=off --introns=off --start=off --stop=off --cds=off --species=human genome.fa > proteins.gff
Would this usage output only protein sequences it found in my genome? That is what I need as an output. I am worried I am using the flags incorrectly. Thank you
What's stopping you from taking the normal output (including proteins, CDS, transcripts, exons, etc...) and just subset the gff/fasta to isolate the protein sequences? You're likely going to need to report the other data in your assembly anyway.