Question

BUSCO did not find any match. Retraining did not complete correctly. BUSCO analysis failed!

0

Entering edit mode

9 months ago

Sony ▴ 20

Hi everyone,

I am doing the gene prediction and annotation for the non-reference sequences, follow the MAKER annotation pipeline.

However, when I train Augustus with BUSCO 5.7.1 (installed via conda), I encounter this error:

busco -i /opt/data/sony/thesis/pan_novoseq_maker/round1/novo_pan_seq_rnd1.maker.output/snap1/non-ref_rnd1.all.maker.transcripts1000.fasta -o non-ref_rnd1_maker -l embryophyta_odb10 -m genome -c 8 -f --long --augustus --augustus_species rice --augustus_parameters='--progress=true'

2024-07-09 15:27:09 INFO:       [augustus]      13 of 64 task(s) completed
2024-07-09 15:27:10 INFO:       [augustus]      20 of 64 task(s) completed
2024-07-09 15:27:11 INFO:       [augustus]      26 of 64 task(s) completed
2024-07-09 15:27:12 INFO:       [augustus]      33 of 64 task(s) completed
2024-07-09 15:27:14 INFO:       [augustus]      39 of 64 task(s) completed
2024-07-09 15:27:15 INFO:       [augustus]      45 of 64 task(s) completed
2024-07-09 15:27:16 INFO:       [augustus]      52 of 64 task(s) completed
2024-07-09 15:27:18 INFO:       [augustus]      58 of 64 task(s) completed
2024-07-09 15:27:22 INFO:       [augustus]      64 of 64 task(s) completed
2024-07-09 15:27:22 INFO:       Extracting predicted proteins...
2024-07-09 15:27:22 INFO:       ***** Run HMMER on gene sequences *****
2024-07-09 15:27:22 INFO:       Running 61 job(s) on hmmsearch, starting at 07/09/2024 15:27:22
2024-07-09 15:27:23 INFO:       [hmmsearch]     7 of 61 task(s) completed
2024-07-09 15:27:23 INFO:       [hmmsearch]     13 of 61 task(s) completed
2024-07-09 15:27:23 INFO:       [hmmsearch]     19 of 61 task(s) completed
2024-07-09 15:27:23 INFO:       [hmmsearch]     25 of 61 task(s) completed
2024-07-09 15:27:23 INFO:       [hmmsearch]     31 of 61 task(s) completed
2024-07-09 15:27:23 INFO:       [hmmsearch]     37 of 61 task(s) completed
2024-07-09 15:27:23 INFO:       [hmmsearch]     43 of 61 task(s) completed
2024-07-09 15:27:23 INFO:       [hmmsearch]     49 of 61 task(s) completed
2024-07-09 15:27:23 INFO:       [hmmsearch]     55 of 61 task(s) completed
2024-07-09 15:27:23 INFO:       [hmmsearch]     61 of 61 task(s) completed
2024-07-09 15:27:23 WARNING:    BUSCO did not find any match. Make sure to check the log files if this is unexpected.
2024-07-09 15:27:23 INFO:       Starting second step of analysis. The gene predictor Augustus is retrained using the results from the initial run to yield more accurate results.
2024-07-09 15:27:23 INFO:       Extracting missing and fragmented buscos from the file ancestral_variants...
2024-07-09 15:27:25 INFO:       Running a BLAST search for BUSCOs against created database
2024-07-09 15:27:25 INFO:       Running 1 job(s) on tblastn, starting at 07/09/2024 15:27:25
2024-07-09 15:27:33 INFO:       [tblastn]       1 of 1 task(s) completed
2024-07-09 15:27:33 INFO:       Converting predicted genes to short genbank files
2024-07-09 15:27:33 WARNING:    No jobs to run on gff2gbSmallDNA.pl
2024-07-09 15:27:33 INFO:       All files converted to short genbank files, now training Augustus using Single-Copy Complete BUSCOs
2024-07-09 15:27:33 INFO:       Running 1 job(s) on new_species.pl, starting at 07/09/2024 15:27:33
2024-07-09 15:27:33 INFO:       [new_species.pl]        1 of 1 task(s) completed
2024-07-09 15:27:33 INFO:       Running 1 job(s) on etraining, starting at 07/09/2024 15:27:33
2024-07-09 15:27:34 INFO:       [etraining]     1 of 1 task(s) completed
2024-07-09 15:27:34 ERROR:      Retraining did not complete correctly. Check your Augustus config path environment variable.
2024-07-09 15:27:34 ERROR:      BUSCO analysis failed!
2024-07-09 15:27:34 ERROR:      Check the logs, read the user guide (https://busco.ezlab.org/busco_userguide.html), and check the BUSCO issue board on https://gitlab.com/ezlab/busco/issues

I tried to train Augustus for the other rice accession, it worked well. However, it showed the error as above when I did train Augustus using Busco for Non-reference sequences ( Non-reference sequences are the assembled sequences from unmapped reads when I perform map the reads to reference genome)

Does anyone have experienced on this matter and How can I trouble this error ? Thank you everyone.

Augustus pipeline MAKER annotation BUSCO • 606 views

ADD COMMENT • link updated 9 months ago by GenoMax 150k • written 9 months ago by Sony ▴ 20

0

Entering edit mode

Non-reference sequences are the assembled sequences from unmapped reads

Have you considered the possibility that those reads are not really part of the genome and you are trying to stuff them in. Are those reads "blast"ing to something that makes sense? At this point there should be enough rice genomes available so big chunks of genome are likely not missing in databases.

ADD REPLY • link 9 months ago by GenoMax 150k

0

Entering edit mode

I already checked and removed contaminants sequences from those non-reference sequences using FCS-GX NCBI tools

ADD REPLY • link 9 months ago by Sony ▴ 20