bwa is getting stuck
1
0
Entering edit mode
6.0 years ago
joomi ▴ 10

Hello,

I've been running bwa mem to do pair wise alignment to a reference. I have already indexed my reference and it resulted in 5 output files with the following extensions: .pac, .sa, .amb, .ann, .bwt. I have run bwa men with 60G multiple times now and all the jobs never finish. They take over 20 hours and often time out at 24 hours. I have run it as:

bwa mem -t 24 ref read1.fq.gz read2.fq.gz > output.sam
bwa mem -1 -t 1 ref read1.fq.gz read2.fq.gz > output.sam
bwa mem ref read1.fq.gz read2.fq.gz > output.sam

Regardless of how I run it or with which parameters standard error files end with the following:

[M::mem_pestat] low and high boundaries for proper pairs: (1, 834)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 66668 reads in 251.152 CPU sec, 251.145 real sec

OR

[M::mem_process_seqs] Processed 66668 reads in 187.866 CPU sec, 187.859 real sec
[M::process] read 66668 sequences (10000200 bp)..

I'm confused on what this error is? Why doesn't the job ever finish? When I google it no one else has this issue.

Any constructive help is great!! I'm honestly fed up with bwa, it seems I'm not the only one that experiences this problem. It's funny how it is so popular when it doesn't work. Are there any other platforms that you would suggest?

bwa mem RNA-Seq alignment • 3.7k views
ADD COMMENT
0
Entering edit mode

Hello joomi ,

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLY
0
Entering edit mode

Hello,

I should have mentioned that I already redirect to a sam file! Sorry about that. My main issue is that all my scripts are timing out. I've been running them for 24 hours and they end up stuck at the point in the standard error files. I'm not sure why it is so slow. Is this normal? How many days should I let it run?

Thank you for your help!!

ADD REPLY
0
Entering edit mode

Any error at the end of the log/std error?

Ps: Please move this post to a comment to Fin's post.

ADD REPLY
0
Entering edit mode

Hello,

I don't have a log file. I have an standard output file and standard error file. The standard output looks like this:

==========================================
SLURM_JOB_ID = 7116608
SLURM_NODELIST = c9-68
==========================================

and the standard error looks like:

    [M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 66668 reads in 201.141 CPU sec, 201.135 real sec
[M::process] read 66668 sequences (10000200 bp)...

Am I missing something? I'm sorry if this is a stupid question but I don't think I have a log file, just a .err and .out file.

ADD REPLY
0
Entering edit mode

Just to get sure: What version of bwa are you using?

You should test with only a smal subset of your reads. For example 1000 reads. You can create such a subset by:

$ zcat read1.fq.gz| head -n 4000| bgzip -c > subset1.fq.gz
$ zcat read2.fq.gz| head -n 4000| bgzip -c > subset2.fq.gz

See what happens then.

fin swimmer

ADD REPLY
0
Entering edit mode

Thank you so much! I subsetted the two reads with zcat and ran bwa. It completed within minutes! My original files are around 7.4G in size and contain around 21959218 reads. This is for the species Zea Mays and in all of my runs I have used around 60G which is 64G/63g is the maximum that I have access to. I've run it with -t 24 as an batch job with 24 cores before and it still gets stuck. Is my best bet to split the read files into two halves equally and run subset1_1 with subset1_2 and subset2_1 with subset2_2? Then concat the sam file together? I can do that but I have 108 of these to do.

ADD REPLY
0
Entering edit mode

is this a reference based assembly? joomi

ADD REPLY
0
Entering edit mode

Please use Add Reply for comments or directly include them into the question by using edit. Otherwise the thread becomes messy pretty soon.

ADD REPLY
0
Entering edit mode
6.0 years ago

Hello joomi ,

the messages you receive are just information and normal. What you are missing is, that bwa print its result to stdout. Saying this you have to redirect it to a file like this:

$ bwa mem -t 24 ref read1.fq.gz read2.fq.gz > output.sam

Usually you will work with coordinate sorted bam files in further analyse steps. Also I recommend to include a Readgroup containing at least the sample name to the bam file, because other tools rely in this information. So the final command line could look like this:

$ bwa mem -R '@RG\tID:SampleName\tSM:SampleName' -t 24 ref.fa read1.fq.gz read2.fq.gz | samtools sort -@ 24 -o output.bam -

fin swimmer

ADD COMMENT
0
Entering edit mode

Hello,

I should have mentioned that I already redirect to a sam file! Sorry about that. My main issue is that all my scripts are timing out. I've been running them for 24 hours and they end up stuck at the point in the standard error files. I'm not sure why it is so slow. Is this normal? How many days should I let it run?

Thank you for your help!!

ADD REPLY
1
Entering edit mode

Please give some details on the data. How many reads are there and what is the species? Given the log files, it seems that you used only one thread not 24 (because real time and CPU time is the same so no multithreading). Also, sudden job death in my experience is a matter of low memory. You should keep track of the memory usage during the job. BWA tends to use quiet a lot of memory when dealing with problematic reads like multimappers, especially if you run on a lot of threads.

ADD REPLY

Login before adding your answer.

Traffic: 1578 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6