maker has been running for a long time
1
0
Entering edit mode
5.3 years ago
olechnwin ▴ 60

Hi all,

I am running genome annotation using MAKER. It's a human genome so about ~3Gb but with ~ 3500 contigs.

I am running MAKER with 20 processes (mpich). It's been running for 4 days on the same 19 contigs.

I cannot tell whether it is supposed to be this long or whether it is stuck on something.

I am running MAKER with custom repeat library. I ran it with these options (maker_opts.ctl):

#-----Genome (these are always required)
genome=/Data/A673_pacbio/sequel_fasta/4-quiver/cns_output/fc_phase_pipeline/output/cns_p.phased.0.fasta
#genome sequence (fasta file or fasta embeded in GFF3 file)
organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic

#-----EST Evidence (for best results provide a file for at least one)
est=/Data/A673_rnaseq/trinity_out_dir/Trinity.fasta

#-----Protein Homology Evidence (for best results provide a file for at least one)
protein=/database/uniprot_sprot_human.fasta
#protein sequence file in fasta format (i.e. from mutiple oransisms)
protein_gff=  #aligned protein homology evidence from an external GFF3 file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=all #select a model organism for RepBase masking in RepeatMasker
rmlib=/Data/A673_pacbio/sequel_fasta/4-quiver/cns_output/fc_phase_pipeline/output/cns_phased0_repeat/RM_23217.WedJul241739322019/consensi.fa.classified
#provide an organism specific repeat library in fasta format for RepeatMasker
repeat_protein=/opt/maker.4/maker/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner

These are the last lines of maker stdout:

running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /gpfs0/scratch/1869571/maker_UKXIzP; /opt/RepeatMasker/RepeatMasker /maker_mpich_run/cns_p.phased.0.maker.output/cns_p.phased.0_datastore/CD/B9/000010F_0//theVoid.000010F_0/0/000010F_0.0.consensi%2Efa%2Eclassified.specific -dir /maker_mpich_run/cns_p.phased.0.maker.output/cns_p.phased.0_datastore/CD/B9/000010F_0//theVoid.000010F_0/0 -pa 1 -lib /Data/A673_pacbio/sequel_fasta/4-quiver/cns_output/fc_phase_pipeline/output/cns_phased0_repeat/RM_23217.WedJul241739322019/consensi.fa.classified
#-------------------------------#

Looking at the node where this is run, only one processes working hard ~100%CPU, the other maker only ran ~1-2 %CPU. If it is supposed to be this long, can someone suggest a way to speed up this process?

annotation genome MAKER • 2.3k views
ADD COMMENT
1
Entering edit mode
5.3 years ago

I recently encountered an issue following a MacOS upgrade such that running MAKER on a Linux VM ended up working better than on my Mac (even though it was previously OK).

I don't know if that is the same issue for you, but I unfortunately am not certain what is causing this specific issue.

MAKER has a contact for support, but it might be worth making clear that those requests will be make public in the Google Groups (so, I think you may just want to join that group and submit the question more directly):

https://groups.google.com/forum/#!forum/maker-devel

ADD COMMENT
1
Entering edit mode

Hi,

Thank you for taking the time to comment. I'm using MAKER in centOS Linux. I'll try to post in the google group too. I'm waiting for them to approve my subscription.

ADD REPLY
0
Entering edit mode

Hmm - the website does say approval is required after sending a message to maker-devel@yandell-lab.org.

However, in the meantime, you can see all the other responses, if that is helpful (and, hopefully, you will be approved soon).

ADD REPLY
1
Entering edit mode

Sadly, I'm still not approved. I've sent two requests.

I tried running their example_01_basic using hsap_contig.fasta genome and it has been running for 20+ hours. I wonder if you have tried running their example? how long does it take? I'm running this example using mpi with 4 processes.

This is the last output/log so far:

Widget::blastx:  
/opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.9     -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.9.repeatrunner
#-------------------------------#
deleted:5 hits
deleted:7 hits
collecting blastx repeatmasking
processing all repeats
in cluster::shadow_cluster...
...finished clustering.

Do you happen to have your completed log? I'd like to see what it looks like if it ran to completion.

ADD REPLY
1
Entering edit mode

I am surprised that you weren't approved - perhaps there is just a general delay in the response time? Maybe you can contact the corresponding author to see if there is some issue with the approval system?

It's been a little while (although some of that content is in their archive).

At first, I thought minutes was the right run-time (since the program initially either froze or stopped within minutes). However, I think hours (or maybe 1 hour) ended up being the right runtime. However, that was for analysis of 1 contig at a time (and I think they were all less than 700,000 bp).

If you have a whole genome to annotate, I think you are going to have additional complications.

ADD REPLY

Login before adding your answer.

Traffic: 2577 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6