I am running a script through our school's cluster to trim and align some fastq files. The script itself, though a little unwieldy, has worked as expected on some files.
sickle pe -f 20-03_1_sequence.txt.gz -r 20-03_2_sequence.txt.gz -t sanger -o 20-03_F.fq.gz -p 20-03_R.fq.gz -s 20-03.trimmed_singles.fastq -x -g -l 125;
bwa mem -M -t 16 $RefSeq 20-03_F.fq.gz 20-03_R.fq.gz | samtools view -Sbh - | samtools sort -> 20-03_initial.bam;
Here, I trimmed the input 150x2 pe files "20-03_1_sequence.txt.gz" and "20-03_2_sequence.txt.gz" and outputted them as "20-03_F/R.fq.gz", when you can see incorporated into the bwa mem line just below. I fastQC-ed the before and after files, and the following processing steps after the "20-03_initial.bam" is made proceed as expected.
However, when I try this exact same script with another set of files, bwa simply exists after an hour or so.
sickle pe -f 1306_MKcombined_pair1.fastq.gz -r 1306_MKcombined_pair2.fastq.gz -t sanger -o 1306_MK_trimmed_F.fq.gz -p 1306_MK_trimmed_R.fq.gz -s 1306_MK_trimmed_singles.fastq -x -g -l 30;
These files are 50x2 pe and similarly successfully underwent fastQC. However, these two input files "1306_MKcombined_pair1/2.fastq.gz" are actually concatenated fastq files from 4 different sequencing lanes, combined into the final two pairs.
Now using the same command
bwa mem -M -t 32 $RefSeq 1306_MK_trimmed_F.fq.gz 1306_MK_trimmed_R.fq.gz | samtools view -Sbh > 1306_initialA.bam;
bwa exits without giving me any error message in the error file (it simply just stops processing)
....
[M::mem_pestat] (25, 50, 75) percentile: (317, 1056, 3028)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 8450)
[M::mem_pestat] mean and std.dev: (1793.22, 1875.27)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 11161)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
the script exits, after an hour or so (it seems random), I'm left with a .bam file about half the size it should be, and none of the processing steps after the bwa mem command occur. Breaking up the alignment step into two steps (generating a .sam file, then samtools view to generate the .bam file) halts the script halfway through generating the .sam file.
Running the script directly from the command line gives the same result, an incomplete .bam/.sam file.
Sorry for such a long description, but would anyone know why this is occurring and why bwa is not giving me any indication of what is going wrong with the alignment?
Hi,
I am having the same problem here. Anyone with a clue of what is happening? I am using the following command:
It has worked fine for all my samples but two. However, I do not see any differences in the read files which is the only think I could think of for why these samples would fail.
The log files look like this, an unterminated bwa mem job:
Thanks very much in advance for your help!
Agathe
EDIT: After some fiddling around, it seems that SLURM (to which I am submitting the command) was the culprit. There are different queues I can use to submit jobs and just changing that overcome the problem.
Agathe, did you try running it twice on the same queue? Some of the NBI nodes were behaving strangely recently and if the problem can be pin-pointed to a node, then CIS will be happy if you let them know which node it is.
Also, you may want to consider using
samtools view -buSh
, so it doesn't compress the .bam data before passing them to samtools sort.