why there is an asterisk symbol at the end of each proteins sequences of fasta files in the Braker3 output?
1
1
Entering edit mode
8 months ago

Hi I am doing a whole genome analysis of the Dalbergia species. After genome assembly, I utilized the BREAKER3 gene prediction tool available on the Galaxy web server. The viridiplantae protein sequences were given as a training set. Subsequently, I used the output and GTF file along with the reference genome to generate the protein fasta file (using this script: https://github.com/Gaius-Augustus/Augustus/blob/master/scripts/getAnnoFastaFromJoingenes.py). However, I noticed that at the end of each fasta file, there is an asterisk symbol. Is it acceptable to remove it using the 'sed' command or how should I handle this? Your expertise and insights on this would be greatly appreciated

protein seq output

>g1.t1
MEGLVRSGINPVRVSGGRRHQSRFLDASTLHLRKRKSGFAVGIGNMKLSSPLVVAAASVG
GSKVVHFENTLPSKETLELWREGDAVCFDVDSTVCLDEGIDELAEFCGAGKAVAEWTARA
MGGSVPFEEALAARLKLFNPSLSQLQNFLEQKPPRLSPGIQELVKKLKANHIDVYLISGG
FRQMINPVASILGIPKENIFANQLLFGSSGEFLGFDENEPTSRSGGKATAVQQIKKAHGY
KALTMIGDGATDLEARRPGGADLFICYAGVQLREAVAAKADWLVFNFKDLINSLG*
>g2.t1
MQGLRRYPNDINPLATIRVYPTVNESDDHEIAALWNRTPALFIGGACVGWLESLVALHVS
GHLVSKLIQVGALWV*

>g3.t1
MVQACYDSFNYNPYCGSCKYPPEELFEALDLGHLGIWSERTNWEGYVTISDDEMSRKLGM
RDVAIVWRGTTPYTE*
gtf breaker3 protein-seq • 518 views
ADD COMMENT
3
Entering edit mode
8 months ago
Juke34 8.9k

It corresponds to stop codon but in theory GTF format does not have stop codon in CDS.
Keep in mind that it exists stop codon reading through that are in a middle of CDS (as for selenoprotein), using sed may affect them.

Using agat_sp_extract_sequences.pl from AGAT can help you:

agat_sp_extract_sequences.pl -g infile.gff -f infile.fasta -p --clean_final_stop --clean_internal_stop
ADD COMMENT

Login before adding your answer.

Traffic: 1957 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6