Why does Maker-P output Match-part?
2
0
Entering edit mode
4.3 years ago
eennadi ▴ 40

Hello, I am trying to annotate a genome using Marker-P. I trained SNAP twice. However, I keep getting this output match_part. Please does anyone have any idea why SNAP outputs match_part? How can SNAP be improved to give actual gene prediction and not match_part? This is affecting my downstream processing.

contig_844_pilon_pilon_pilon    maker   three_prime_UTR 78426   78543   .       +       .       ID=maker-contig_844_pilon_pilon_pilon-snap-gene-0.9-mRNA-1:three_prime_utr;Parent=mak
er-contig_844_pilon_pilon_pilon-snap-gene-0.9-mRNA-1
contig_844_pilon_pilon_pilon    snap_masked     match   3034    4761    -11.113 -       .       ID=contig_844_pilon_pilon_pilon:hit:2139958:4.5.0.0;Name=snap_masked-contig_844_pilon
_pilon_pilon-abinit-gene-0.5-mRNA-1;target_length=137831
contig_844_pilon_pilon_pilon    snap_masked     match_part      4756    4761    4.752   -       .       ID=contig_844_pilon_pilon_pilon:hsp:2546570:4.5.0.0;Parent=contig_844_pilon_p
ilon_pilon:hit:2139958:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.5-mRNA-1 247 252 +;Gap=M6
contig_844_pilon_pilon_pilon    snap_masked     match_part      3438    3467    -4.056  -       .       ID=contig_844_pilon_pilon_pilon:hsp:2546571:4.5.0.0;Parent=contig_844_pilon_p
ilon_pilon:hit:2139958:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.5-mRNA-1 217 246 +;Gap=M30
contig_844_pilon_pilon_pilon    snap_masked     match_part      3034    3249    -11.809 -       .       ID=contig_844_pilon_pilon_pilon:hsp:2546572:4.5.0.0;Parent=contig_844_pilon_p
ilon_pilon:hit:2139958:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.5-mRNA-1 1 216 +;Gap=M216
contig_844_pilon_pilon_pilon    snap_masked     match   8277    70209   38.02   +       .       ID=contig_844_pilon_pilon_pilon:hit:2139959:4.5.0.0;Name=snap_masked-contig_844_pilon
_pilon_pilon-abinit-gene-0.0-mRNA-1;target_length=137831
contig_844_pilon_pilon_pilon    snap_masked     match_part      8277    8283    0.381   +       .       ID=contig_844_pilon_pilon_pilon:hsp:2546573:4.5.0.0;Parent=contig_844_pilon_p
ilon_pilon:hit:2139959:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.0-mRNA-1 1 7 +;Gap=M7
contig_844_pilon_pilon_pilon    snap_masked     match_part      10022   10063   22.205  +       .       ID=contig_844_pilon_pilon_pilon:hsp:2546574:4.5.0.0;Parent=contig_844_pilon_p
ilon_pilon:hit:2139959:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.0-mRNA-1 8 49 +;Gap=M42
contig_844_pilon_pilon_pilon    snap_masked     match_part      10376   10452   8.187   +       .       ID=contig_844_pilon_pilon_pilon:hsp:2546575:4.5.0.0;Parent=contig_844_pilon_p
ilon_pilon:hit:2139959:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.0-mRNA-1 50 126 +;Gap=M77
contig_844_pilon_pilon_pilon    snap_masked     match_part      11500   12132   44.777  +       .       ID=contig_844_pilon_pilon_pilon:hsp:2546576:4.5.0.0;Parent=contig_844_pilon_p
ilon_pilon:hit:2139959:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.0-mRNA-1 127 759 +;Gap=M633
contig_844_pilon_pilon_pilon    snap_masked     match_part      65704   65857   13.256  +       .       ID=contig_844_pilon_pilon_pilon:hsp:2546577:4.5.0.0;Parent=contig_844_pilon_p
ilon_pilon:hit:2139959:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.0-mRNA-1 760 913 +;Gap=M154
contig_844_pilon_pilon_pilon    snap_masked     match_part      66366   66479   20.228  +       .       ID=contig_844_pilon_pilon_pilon:hsp:2546578:4.5.0.0;Parent=contig_844_pilon_p
ilon_pilon:hit:2139959:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.0-mRNA-1 914 1027 +;Gap=M114
contig_844_pilon_pilon_pilon    snap_masked     match_part      67420   67468   1.301   +       .       ID=contig_844_pilon_pilon_pilon:hsp:2546579:4.5.0.0;Parent=contig_844_pilon_p
ilon_pilon:hit:2139959:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.0-mRNA-1 1028 1076 +;Gap=M49
contig_844_pilon_pilon_pilon    snap_masked     match_part      68142   68984   -81.985 +       .       ID=contig_844_pilon_pilon_pilon:hsp:2546580:4.5.0.0;Parent=contig_844_pilon_pilon_pilon:hit:2139959:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.0-mRNA-1 1077 1919 +;Gap=M843
contig_844_pilon_pilon_pilon    snap_masked     match_part      70197   70209   9.670   +       .       ID=contig_844_pilon_pilon_pilon:hsp:2546581:4.5.0.0;Parent=contig_844_pilon_pilon_pilon:hit:2139959:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.0-mRNA-1 1920 1932 +;Gap=M13
contig_844_pilon_pilon_pilon    snap_masked     match   73499   74007   16.582  +       .       ID=contig_844_pilon_pilon_pilon:hit:2139960:4.5.0.0;Name=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1;target_length=137831
contig_844_pilon_pilon_pilon    snap_masked     match_part      73499   73503   5.483   +       .       ID=contig_844_pilon_pilon_pilon:hsp:2546582:4.5.0.0;Parent=contig_844_pilon_pilon_pilon:hit:2139960:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 1 5 +;Gap=M5
contig_844_pilon_pilon_pilon    snap_masked     match_part      73517   73898   4.058   +       .       ID=contig_844_pilon_pilon_pilon:hsp:2546583:4.5.0.0;Parent=contig_844_pilon_pilon_pilon:hit:2139960:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 6 387 +;Gap=M382
contig_844_pilon_pilon_pilon    snap_masked     match_part      73993   74007   7.041   +       .       ID=contig_844_pilon_pilon_pilon:hsp:2546584:4.5.0.0;Parent=contig_844_pilon_pilon_pilon:hit:2139960:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 388 402 +;Gap=M15
assembly software error • 1.9k views
ADD COMMENT
0
Entering edit mode

Can you please reformat your MAKER output as blockquote or code? it's impossible to read as it is now...

ADD REPLY
0
Entering edit mode

Thanks @juke34

Here is the maker control option is used

#-----Genome (these are always required)
genome= ~/Flye_racon3_pilon3.fasta #genome sequence (fasta file or fasta embeded in GFF3 file)
organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic

#-----Re-annotation Using MAKER Derived GFF3
maker_gff= #MAKER derived GFF3 file
est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no
rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no

#-----EST Evidence (for best results provide a file for at least one)
est=~/maker/EST_Fabaceae.fasta,~/emmanuel/maker/GENV01.1_transcrptomedata.fa  #set of ESTs or assembled mRNA-seq in fasta format
altest= #EST/cDNA sequence file in fasta format from an alternate organism
est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
altest_gff= #aligned ESTs from a closly relate species in GFF3 format

#-----Protein Homology Evidence (for best results provide a file for at least one)
protein=~/Makersupport/uniprot-fabaceae-filtered-reviewed_18_05_2020.fasta  #protein sequence file in fasta format (i.e. from mutiple organisms)
protein_gff=  #aligned protein homology evidence from an external GFF3 file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org= #select a model organism for RepBase masking in RepeatMasker
rmlib=~/maker/Fabacae_repeat.fa #provide an organism specific repeat library in fasta format for RepeatMasker
repeat_protein=  #provide a fasta file of transposable element proteins for RepeatRunner
rm_gff= #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)

#-----Gene Prediction
snaphmm=~/maker/Mucuna_2020_05_18/annotation_combined_repeats/mucuna_snap_2/mucuna_snap2.hmm  #SNAP HMM file
gmhmm= #GeneMark HMM file
augustus_species=  #Augustus gene prediction species model
fgenesh_par_file= #FGENESH parameter file
pred_gff= #ab-initio predictions from an external GFF3 file
model_gff= #annotated gene models from an external GFF3 file (annotation pass-through)
run_evm=0 #run EvidenceModeler, 1 = yes, 0 = no
est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no
trna=1 #find tRNAs with tRNAscan, 1 = yes, 0 = no
snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
snoscan_meth= #-O-methylation site fileto have Snoscan find snoRNAs
unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no
allow_overlap= #allowed gene overlap fraction (value from 0 to 1, blank for default)

#-----MAKER Behavior Options
max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage)
min_contig=1 #skip genome contigs below this length (under 10kb are often useless)

pred_flank=200 #flank for extending evidence clusters sent to gene predictors
pred_stats=0 #report AED and QI statistics for all predictions as well as models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no
map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no
keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)
ADD REPLY
0
Entering edit mode

eennadi : Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question

ADD REPLY
0
Entering edit mode

check your output to see if you have any gene model:

awk '{if($3 == "gene") print $0}' maker_file.gff | wc -l

ADD REPLY
0
Entering edit mode
4.3 years ago
Juke34 8.9k

match / match_part from SNAP are the pure snap prediction not yet processed by MAKER. MAKER will make gene models from it according to parameters you have set. E.g. if keep_preds option is set to 0, MAKER will select the gene models in agreement with the available extrinsic evidence (protein / transcript alignments). In this case if you didn't provide any extrinsic evidence then no gene models will be selected/created by MAKER.

I might suggest to use maker_merge_outputs_from_datastore.pl from GAAS to collect the MAKER output. Here is explained the MAKER output.

ADD COMMENT
0
Entering edit mode
4.3 years ago
liorglic ★ 1.5k

There is nothing wrong with having match and match_part features in your output gff - these are simply the "evidence" from which MAKER synthesizes gene models. In many cases they are not needed, so to get a gff containing only gene models you can use: gff3_merge -d <data store index> -n -g.

ADD COMMENT

Login before adding your answer.

Traffic: 1542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6