How can you measure the completeness of an annotation process?
1
0
Entering edit mode
4.1 years ago
eennadi ▴ 40

I am annotating a plant genome using Maker-P. I used EST and transcriptome data. I reduced the redunancy in the EST using cdhit. After three rounds of Maker( EST2genome and protein2genome followed by training SNAP twice and training Augustus twice) I now have a total set of genes. I am expecting more genes than I now have, although this is a novel genome with no reference.

How can I tell if my annotation is complete?

Thanks

Assembly • 2.5k views
ADD COMMENT
0
Entering edit mode

What is your expectation based on? You could compare with related species.

ADD REPLY
0
Entering edit mode

Closely related species have gene counts of about 26,857, 23,197 , 22,427 but the paper that reported this had a Complete (%) to CEGs by CEGMA pipeline 86.29

ADD REPLY
0
Entering edit mode

And how many do you have?

ADD REPLY
0
Entering edit mode

I have 17973 with a BUSCO of C:68.4%[S:64.5%,D:3.9%],F:6.0%,M:25.6%,n:1440

The BUSCO score for the genome assembly is 93.7%

ADD REPLY
0
Entering edit mode

I ran BUSCO with this commanline

python /mnt/bin/busco/scripts/run_BUSCO.py -i  ~.maker.transcripts.fasta -o output -l ${LINEAGE} -m transcriptome -c 15  -sp my_species  -z --augustus_parameters='--progress=true'

C:68.4%[S:64.5%,D:3.9%],F:6.0%,M:25.6%,n:1440

The BUSCO score for the genome assembly is 93.7%

ADD REPLY
0
Entering edit mode

You lost 25% of the Busco genes during the annotation process. This is not good

ADD REPLY
0
Entering edit mode

I am trying to use Braker for re-annotation and to evaluate. But BRAKER has been very difficult to use. It keeps dying without any error.

Do you have any suggestion on how to recover the lost 25% BUSCO?

ADD REPLY
0
Entering edit mode

Did you activate the keep_pred parameter?

ADD REPLY
0
Entering edit mode

No I did not activate the keep_pred. When I set keep_pred=1 it gives proteins with AED of 1 see example:
mRNA-1 protein AED:1.00 eAED:1.00 QI:0|0|0|0|1|1|6|0|661

ADD REPLY
0
Entering edit mode

Normal it adds prediction that do not have any support from the evidence (est or protein)

ADD REPLY
0
Entering edit mode

Can one proceed with these unsupported predictions?

ADD REPLY
0
Entering edit mode

So run with keep_preds. If you have between 25000 and 30000 genes is fine, your busco will be much better. Then yiu can also give a try without snap and check the busco. Deactivating can give better results

ADD REPLY
0
Entering edit mode
4.1 years ago
Juke34 8.9k

Run BUSCO on you assembly. Then get the protein you have predicted (all of them with isoforms) and run BUSCO in protein mode. Compare the global result (do not care about duplicated ones) You should have something pretty close. If your Busco on proteins is way below you have a problem in the annotations steps.

ADD COMMENT

Login before adding your answer.

Traffic: 2283 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6