Hi all,
I am running genome annotation using MAKER. It's a human genome so about ~3Gb but with ~ 3500 contigs.
I am running MAKER with 20 processes (mpich). It's been running for 4 days on the same 19 contigs.
I cannot tell whether it is supposed to be this long or whether it is stuck on something.
I am running MAKER with custom repeat library. I ran it with these options (maker_opts.ctl):
#-----Genome (these are always required)
genome=/Data/A673_pacbio/sequel_fasta/4-quiver/cns_output/fc_phase_pipeline/output/cns_p.phased.0.fasta
#genome sequence (fasta file or fasta embeded in GFF3 file)
organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic
#-----EST Evidence (for best results provide a file for at least one)
est=/Data/A673_rnaseq/trinity_out_dir/Trinity.fasta
#-----Protein Homology Evidence (for best results provide a file for at least one)
protein=/database/uniprot_sprot_human.fasta
#protein sequence file in fasta format (i.e. from mutiple oransisms)
protein_gff= #aligned protein homology evidence from an external GFF3 file
#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=all #select a model organism for RepBase masking in RepeatMasker
rmlib=/Data/A673_pacbio/sequel_fasta/4-quiver/cns_output/fc_phase_pipeline/output/cns_phased0_repeat/RM_23217.WedJul241739322019/consensi.fa.classified
#provide an organism specific repeat library in fasta format for RepeatMasker
repeat_protein=/opt/maker.4/maker/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner
These are the last lines of maker stdout:
running repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /gpfs0/scratch/1869571/maker_UKXIzP; /opt/RepeatMasker/RepeatMasker /maker_mpich_run/cns_p.phased.0.maker.output/cns_p.phased.0_datastore/CD/B9/000010F_0//theVoid.000010F_0/0/000010F_0.0.consensi%2Efa%2Eclassified.specific -dir /maker_mpich_run/cns_p.phased.0.maker.output/cns_p.phased.0_datastore/CD/B9/000010F_0//theVoid.000010F_0/0 -pa 1 -lib /Data/A673_pacbio/sequel_fasta/4-quiver/cns_output/fc_phase_pipeline/output/cns_phased0_repeat/RM_23217.WedJul241739322019/consensi.fa.classified
#-------------------------------#
Looking at the node where this is run, only one processes working hard ~100%CPU, the other maker only ran ~1-2 %CPU. If it is supposed to be this long, can someone suggest a way to speed up this process?
Hi,
Thank you for taking the time to comment. I'm using MAKER in centOS Linux. I'll try to post in the google group too. I'm waiting for them to approve my subscription.
Hmm - the website does say approval is required after sending a message to maker-devel@yandell-lab.org.
However, in the meantime, you can see all the other responses, if that is helpful (and, hopefully, you will be approved soon).
Sadly, I'm still not approved. I've sent two requests.
I tried running their example_01_basic using hsap_contig.fasta genome and it has been running for 20+ hours. I wonder if you have tried running their example? how long does it take? I'm running this example using mpi with 4 processes.
This is the last output/log so far:
Do you happen to have your completed log? I'd like to see what it looks like if it ran to completion.
I am surprised that you weren't approved - perhaps there is just a general delay in the response time? Maybe you can contact the corresponding author to see if there is some issue with the approval system?
It's been a little while (although some of that content is in their archive).
At first, I thought minutes was the right run-time (since the program initially either froze or stopped within minutes). However, I think hours (or maybe 1 hour) ended up being the right runtime. However, that was for analysis of 1 contig at a time (and I think they were all less than 700,000 bp).
If you have a whole genome to annotate, I think you are going to have additional complications.