Question

MAKER has many FAILED

0

Entering edit mode

9 months ago

san96 ▴ 190

Hi everyone,

I'm trying to annotate a rather large plant genome (4Gb) but I've had problems with the first round of maker. I initially identified repeats in my genome with EarlGrey, a pipeline that runs RepeatModeler and RepeatMasker internally, which allowed me to obtain a softmask genome, as well as a series of files that allowed me to obtain a gff of repeats:

plant_primary_round3_sort.fasta.prep.cat.gz
plant_primary_round3_sort.fasta.prep.masked
plant_primary_round3_sort.fasta.prep.out
plant_primary_round3_sort.fasta.prep.tbl
...
plant_primary_round3_sort.fasta.prep.out.complex.gff3
plant_primary_round3_sort.fasta.prep.out.complex.reformat.gff3
plant_primary_round3_sort.fasta.prep.out.gff3

In my first round I am using my no-softmask genome, the gff file of repeats, proteins and a Trinity assembly, however I am getting many FAILED:

*I don't know if it could be because of my gff file of repeats.


scaffold_1      plant_round1_normal_datastore/49/CD/scaffold_1/    STARTED
scaffold_1      plant_round1_normal_datastore/49/CD/scaffold_1/    FINISHED
scaffold_2      plant_round1_normal_datastore/87/E3/scaffold_2/    STARTED
scaffold_2      plant_round1_normal_datastore/87/E3/scaffold_2/    FINISHED
scaffold_3      plant_round1_normal_datastore/47/19/scaffold_3/    STARTED
scaffold_3      plant_round1_normal_datastore/47/19/scaffold_3/    FINISHED
....
scaffold_9      plant_round1_normal_datastore/F3/F3/scaffold_9/    STARTED
scaffold_9      plant_round1_normal_datastore/F3/F3/scaffold_9/    FAILED
scaffold_10     plant_round1_normal_datastore/C3/86/scaffold_10/   STARTED
scaffold_10     plant_round1_normal_datastore/C3/86/scaffold_10/   FAILED
....
scaffold_4000     plant_round1_normal_datastore/3A/5C/scaffold_13/   STARTED
scaffold_4000     plant_round1_normal_datastore/3A/5C/scaffold_13/   FAILED

Could someone help me with this problem or perhaps suggest another way to annotate with maker? Maybe with the softmask genome.

My maker_opts.ctl (round1) is:

#-----Genome (these are always required)
genome=/primary/Acam_primary_round3_sort.fasta #genome sequence (fasta file or fasta embeded in GFF3 file)
organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic

#-----Re-annotation Using MAKER Derived GFF3
maker_gff= #MAKER derived GFF3 file
est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no
rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no

#-----EST Evidence (for best results provide a file for at least one)
est=/primary/anotacion_plant/maker_anotacion/evidence/Trinity_90.fasta #set of ESTs or assembled mRNA-seq in fasta format
altest= #EST/cDNA sequence file in fasta format from an alternate organism
est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
altest_gff= #aligned ESTs from a closly relate species in GFF3 format

#-----Protein Homology Evidence (for best results provide a file for at least one)
protein=/primary/anotacion_plant/maker_anotacion/evidence/Viridiplantae.fa  #protein sequence file in fasta format (i.e. from mutiple oransisms)
protein_gff=  #aligned protein homology evidence from an external GFF3 file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=simple #select a model organism for RepBase masking in RepeatMasker
rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker
repeat_protein=/data/software/maker-2.31/maker/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner
rm_gff=/EarlGrey/Acam_primary_round3_sort.fasta.prep.out.complex.reformat.gff3 #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)

#-----Gene Prediction
snaphmm= #SNAP HMM file
gmhmm= #GeneMark HMM file
augustus_species= #Augustus gene prediction species model
fgenesh_par_file= #FGENESH parameter file
pred_gff= #ab-initio predictions from an external GFF3 file
model_gff= #annotated gene models from an external GFF3 file (annotation pass-through)
est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no

#-----Other Annotation Feature Types (features MAKER doesn't recognize)
other_gff= #extra features to pass-through to final MAKER generated GFF3 file

#-----External Application Behavior Options
alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases
cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI)

#-----MAKER Behavior Options
max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage)
min_contig=1 #skip genome contigs below this length (under 10kb are often useless)

pred_flank=200 #flank for extending evidence clusters sent to gene predictors
pred_stats=0 #report AED and QI statistics for all predictions as well as models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no
map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no
keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)

split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)
single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no
single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'
correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes

tries=2 #number of times to try a contig if there is a failure for some reason
clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no
clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no
TMP=/scratch #specify a directory other than the system default temporary directory for temporary files

Thank so much.

genome softmask MAKER annotation • 731 views

ADD COMMENT • link updated 9 months ago by GenoMax 152k • written 9 months ago by san96 ▴ 190

1

Entering edit mode

Is there any additional information in logs? Are those contigs large and are failing because of memory availability?

ADD REPLY • link 9 months ago by GenoMax 152k

0

Entering edit mode

Hi GenoMax

It finishes but does not throw any error, I even increased the ram memory but I still have no success, do you recommend another way with maker?

ADD REPLY • link 9 months ago by san96 ▴ 190

0

Entering edit mode

This may sound stupid but let us go in the other direction. Have the contigs been filtered and QC'ed (for minimum length/content e.g. no poly-G's etc)?

ADD REPLY • link 9 months ago by GenoMax 152k