Entering edit mode
5.1 years ago
grey
▴
40
Hi all. I'm looking for a handy tool to blast amino acid sequence output from Augustus annotation against a protein database to identify the best ortholog. I can imagine doing this with a homemade script but thought there might be a ready made tool out there.
Thanks in advance!
Details:
Augustus output gff looks like this...
# start gene g7
SCF_1 AUGUSTUS gene 35727 36261 0.01 - . g7
SCF_1 AUGUSTUS transcript 35727 36261 0.01 - . g7.t1
SCF_1 AUGUSTUS tts 35727 35727 . - . transcript_id "g7.t1"; gene_id "g7";
SCF_1 AUGUSTUS exon 35727 35945 . - . transcript_id "g7.t1"; gene_id "g7";
SCF_1 AUGUSTUS stop_codon 35907 35909 . - 0 transcript_id "g7.t1"; gene_id "g7";
SCF_1 AUGUSTUS intron 35946 36036 0.39 - . transcript_id "g7.t1"; gene_id "g7";
SCF_1 AUGUSTUS CDS 35907 35945 0.39 - 0 transcript_id "g7.t1"; gene_id "g7";
SCF_1 AUGUSTUS CDS 36037 36228 0.39 - 0 transcript_id "g7.t1"; gene_id "g7";
SCF_1 AUGUSTUS exon 36037 36261 . - . transcript_id "g7.t1"; gene_id "g7";
SCF_1 AUGUSTUS start_codon 36226 36228 . - 0 transcript_id "g7.t1"; gene_id "g7";
SCF_1 AUGUSTUS tss 36261 36261 . - . transcript_id "g7.t1"; gene_id "g7";
# protein sequence = [MISTASVSGSVDLPRPMKIDSSASPEIESDPTPTSPEGSRTSGSPDRHDPSTSSPSPSRGGDNQNIGNYFVFQLRK]
I want to find the best protein match for the protein sequence at the end.
if the real issue is how to get to proteins / AA sequence starting from a gff file, I suggest to rephrase your question such that it is clear that's what you want to achieve.