Hi, I have a question about the augustus (v3.0.2, v3.0.1) output - specifically when it reports the number of "hint groups fully obeyed" and "incompatible hint groups".
I used a test genomic sequence (~3kb) which has a typical two-exon gene, the sequence is available here.
I know exactly where the exons begin and end, so I put these information in a hint file (I also pasted it below since it's really straightforward to understand).
scaffold_0 . start 101 103 10 . . pri=2;grp=start;src=M
scaffold_0 . CDSpart 101 379 10 . . pri=2;grp=cds1;src=M
scaffold_0 . CDSpart 3040 3210 10 . . pri=2;grp=cds2;src=M
scaffold_0 . intronpart 380 3039 10 . . pri=2;grp=intron;src=M
scaffold_0 . stop 3208 3210 10 . . pri=2;grp=stop;src=M
Then I ran augustus:
augustus --species=arabidopsis --hintsfile=hints.gff --gff3=on test.fas
which gives the following output:
##gff-version 3
# This output was generated with AUGUSTUS (version 3.0.2).
# ----- prediction on sequence number 1 (length = 3232, name = scaffold_0) -----
#
# Predicted genes for sequence number 1 on both strands
# start gene g1
scaffold_0 AUGUSTUS gene 76 3232 0.08 + . ID=g1
scaffold_0 AUGUSTUS transcript 76 3232 0.08 + . ID=g1.t1;Parent=g1
scaffold_0 AUGUSTUS transcription_start_site 76 76 . + . Parent=g1.t1
scaffold_0 AUGUSTUS exon 76 379 . + . Parent=g1.t1
scaffold_0 AUGUSTUS start_codon 101 103 . + 0 Parent=g1.t1
scaffold_0 AUGUSTUS intron 380 3039 1 + . Parent=g1.t1
scaffold_0 AUGUSTUS CDS 101 379 1 + 0 ID=g1.t1.cds;Parent=g1.t1
scaffold_0 AUGUSTUS CDS 3040 3210 1 + 0 ID=g1.t1.cds;Parent=g1.t1
scaffold_0 AUGUSTUS exon 3040 3232 . + . Parent=g1.t1
scaffold_0 AUGUSTUS stop_codon 3208 3210 . + 0 Parent=g1.t1
scaffold_0 AUGUSTUS transcription_end_site 3232 3232 . + . Parent=g1.t1
......
# Evidence for and against this transcript:
# % of transcript supported by hints (any source): 60
# CDS exons: 2/2
# M: 2
# CDS introns: 1/1
# M: 1
# 5'UTR exons and introns: 0/1
# 3'UTR exons and introns: 0/1
# hint groups fully obeyed: 1
# M: 1 (intron)
# incompatible hint groups: 4
# M: 4 (start,cds1,cds2,stop)
# end gene g1
###
As you can see augustus is predicting exactly the same gene structure as the hint file. However, it is claiming that it is following only 1 hint (intron) and "incompatible" with all other hints (start codon, stop codon, plus two CDS). How could this be possible?
PS: I changed "CDSpart" to "CDS" and augustus does seem to obey the rule - but what if I don't know the exact boundary of a CDS and have to put in "CDSpart"?
PPS: I have tried augustus v3.0.2, v3.0.1, v2.7.1 and problem all exists, however, v2.5.5 gave the right summary report though.
Any comments would be greatly appreciated!