augustus output - how is the incompatible hint groups determined?
1
0
Entering edit mode
10.4 years ago
Orionzhou ▴ 10

Hi, I have a question about the augustus (v3.0.2, v3.0.1) output - specifically when it reports the number of "hint groups fully obeyed" and "incompatible hint groups".

I used a test genomic sequence (~3kb) which has a typical two-exon gene, the sequence is available here.

I know exactly where the exons begin and end, so I put these information in a hint file (I also pasted it below since it's really straightforward to understand).

scaffold_0      .       start   101     103     10      .       .       pri=2;grp=start;src=M
scaffold_0      .       CDSpart 101     379     10      .       .       pri=2;grp=cds1;src=M
scaffold_0      .       CDSpart 3040    3210    10      .       .       pri=2;grp=cds2;src=M
scaffold_0      .       intronpart      380     3039    10      .       .       pri=2;grp=intron;src=M
scaffold_0      .       stop    3208    3210    10      .       .       pri=2;grp=stop;src=M

Then I ran augustus:

augustus --species=arabidopsis --hintsfile=hints.gff --gff3=on test.fas

which gives the following output:

##gff-version 3
# This output was generated with AUGUSTUS (version 3.0.2).
# ----- prediction on sequence number 1 (length = 3232, name = scaffold_0) -----
#
# Predicted genes for sequence number 1 on both strands
# start gene g1
scaffold_0    AUGUSTUS    gene    76    3232    0.08    +    .    ID=g1
scaffold_0    AUGUSTUS    transcript    76    3232    0.08    +    .    ID=g1.t1;Parent=g1
scaffold_0    AUGUSTUS    transcription_start_site    76    76    .    +    .    Parent=g1.t1
scaffold_0    AUGUSTUS    exon    76    379    .    +    .    Parent=g1.t1
scaffold_0    AUGUSTUS    start_codon    101    103    .    +    0    Parent=g1.t1
scaffold_0    AUGUSTUS    intron    380    3039    1    +    .    Parent=g1.t1
scaffold_0    AUGUSTUS    CDS    101    379    1    +    0    ID=g1.t1.cds;Parent=g1.t1
scaffold_0    AUGUSTUS    CDS    3040    3210    1    +    0    ID=g1.t1.cds;Parent=g1.t1
scaffold_0    AUGUSTUS    exon    3040    3232    .    +    .    Parent=g1.t1
scaffold_0    AUGUSTUS    stop_codon    3208    3210    .    +    0    Parent=g1.t1
scaffold_0    AUGUSTUS    transcription_end_site    3232    3232    .    +    .    Parent=g1.t1
......

# Evidence for and against this transcript:
# % of transcript supported by hints (any source): 60
# CDS exons: 2/2
#      M:   2
# CDS introns: 1/1
#      M:   1
# 5'UTR exons and introns: 0/1
# 3'UTR exons and introns: 0/1
# hint groups fully obeyed: 1
#      M:   1 (intron)
# incompatible hint groups: 4
#      M:   4 (start,cds1,cds2,stop)
# end gene g1
###

As you can see augustus is predicting exactly the same gene structure as the hint file. However, it is claiming that it is following only 1 hint (intron) and "incompatible" with all other hints (start codon, stop codon, plus two CDS). How could this be possible?

PS: I changed "CDSpart" to "CDS" and augustus does seem to obey the rule - but what if I don't know the exact boundary of a CDS and have to put in "CDSpart"?

PPS: I have tried augustus v3.0.2, v3.0.1, v2.7.1 and problem all exists, however, v2.5.5 gave the right summary report though.

Any comments would be greatly appreciated!

augustus hints gene-prediction • 4.7k views
ADD COMMENT
0
Entering edit mode
5.8 years ago
dukecomeback ▴ 40

I'm a little disappointed with Augustus, some times I know the hint is corrected, however no matter how hard I try, Augustus just do not obey the hint.

Feel won't love anymore.

ADD COMMENT

Login before adding your answer.

Traffic: 1960 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6