Hi,
I am trying to run a second round of Maker annotation job with a SNAP trained file.
However, when I try to pass maker_gff=maker.all.gff.file.from.the.first.round.gff
.
I get Non-unique top level ID
error for all the scaffolds.
the first part of the maker_opts.ctl
looks like this:
#-----Genome (these are always required)
genome=path/to/assembly.fasta #genome sequence (fasta file or fasta embeded in GFF3 file)
organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic
#-----Re-annotation Using MAKER Derived GFF3
maker_gff=path/to/maker/gff/file/from/the/first/round/maker.round.1.all.gff #MAKER derived GFF3 file
est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no
altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no
model_pass=1 #use gene models in maker_gff: 1 = yes, 0 = no
pred_pass=1 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
other_pass=1 #passthrough anyything else in maker_gff: 1 = yes, 0 = no
#-----EST Evidence (for best results provide a file for at least one)
est= #set of ESTs or assembled mRNA-seq in fasta format
altest= #EST/cDNA sequence file in fasta format from an alternate organism
est_gff=path/to/maker.round1.est2genome.gff #aligned ESTs or mRNA-seq from an external GFF3 file
altest_gff= #aligned ESTs from a closly relate species in GFF3 format
#-----Protein Homology Evidence (for best results provide a file for at least one)
protein= #protein sequence file in fasta format (i.e. from mutiple oransisms)
protein_gff=path/to/maker.round1.protein2genome.gff #aligned protein homology evidence from an external GFF3 file
#-----Repeat Masking (leave values blank to skip repeat masking)
model_org= #select a model organism for RepBase masking in RepeatMasker
rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker
repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner
rm_gff=path/to/maker.round1.repeats.gff #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)
#-----Gene Prediction
snaphmm=path/to/snap/trained/file/from/first/round/snap.round1.hmm #SNAP HMM file
gmhmm= #GeneMark HMM file
augustus_species= my_species #Augustus gene prediction species model
and a part of the error file looks like this
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Scaff_17
Length: 21773
#---------------------------------------------------------------------
setting up GFF3 output and fasta chunks
setting up GFF3 output and fasta chunks
doing repeat masking
doing repeat masking
ERROR: Non-unique top level ID for Scaff_17:hit:0:1.3.0.0
While this is technically legal in GFF3, it usually
indicates a poorly fomatted GFF3 file (perhaps you
tried to merge two GFF3 files without accounting for
unique IDs). MAKER will not handle these correctly.
--> rank=5, hostname=wbl008
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Scaff_17
ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:Scaff_17
examining contents of the fasta file and run log
ERROR: Non-unique top level ID for Scaff_1:hit:0:1.3.0.0
While this is technically legal in GFF3, it usually
indicates a poorly fomatted GFF3 file (perhaps you
tried to merge two GFF3 files without accounting for
unique IDs). MAKER will not handle these correctly.
--> rank=6, hostname=wbl008
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Scaff_1
ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:Scaff_1
ERROR: Non-unique top level ID for Scaff_16:hit:0:1.3.0.0
While this is technically legal in GFF3, it usually
indicates a poorly fomatted GFF3 file (perhaps you
tried to merge two GFF3 files without accounting for
unique IDs). MAKER will not handle these correctly.
--> rank=14, hostname=wbl008
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Scaff_16
It fails for all the scaffolds not just a few.
maker_round_1_master_datastore_index.log
shows failed report for all the scaffolds.
I tried gff3_merge
with and without -l
flag, both gff3
files ended up giving the same error.
I also tried gaas_maker_merge_outputs_from_datastore.pl
and used maker_mix.gff
file for maker_gff
. It fails with the same error.
When I grep non unique id eg:
grep -n "Scaff_14:hit:0:1.3.0.0" maker_mix.gff
It shows two hits:
3383054:Scaff_14 repeatmasker match 7723 7762 14 + . ID=Scaff_14:hit:0:1.3.0.0;Name=species:%28ATATA%29n|genus:Simple_repeat;Target=species:%28ATATA%29n|genus:Simple_repeat 1 41 +
3383055:Scaff_14 repeatmasker match_part 7723 7762 14 + . ID=Scaff_14:hsp:0:1.3.0.0;Parent=Scaff_14:hit:0:1.3.0.0;Target=species:%2528ATATA%2529n|genus:Simple_repeat 1 41 +
I saw some tutorials not passing maker_gff
on the second round of maker. But when I do that number of gene models decreases.
Can someone help me, please?
Thank you,
Upendra
Any chance one of your input fasta files contains multiple records with the same name? In any case, the best place to ask about Maker is the mailing list.
Thank you for your reply. I had renamed the sequence header for the input fasta to be sure about that.
I will write on the mailing list, Thank you.
U