I am running a Nextflow pipeline on our cluster (SLURM) using BBTools to trim adapters and map RNA-seq files. I receive the following error every time that my RNA-seq workflow gets to the 'mapping' stage and i'm not entirely sure if this is a BBMap problem, or a problem with our hardware and whether I should switch over to HISAT2 if this isn't an easy fix.
Any ideas? I've tried setting threads=1
in both the BBDuk and BBMap steps, but this doesn't seem to fix the issue. Could it be FASTA related?
ERROR ~ Error executing process > 'mapping (1)'
Caused by:
Process `mapping (1)` terminated with an error exit status (1)
Command executed:
bbmap.sh in=NT_Day1_Rep2_1_postTrimmed.fastq.gz in2=NT_Day1_Rep2_2_postTrimmed.fastq.gz outm=NT_Day1_Rep2.mapped.bam outu=NT_Day1_Rep2.unmapped.bam keepnames=t trd sam=1.3 intronlen=20 maxindel=200k ambig=random statsfile=NT_Day1_Rep2.alignmentReport.txt minid=0.76 threads=1
samtools sort -T NT_Day1_Rep2-@ 1 -o NT_Day1_Rep2.sorted.mapped.bam NT_Day1_Rep2.mapped.bam
samtools index -@ 1 NT_Day1_Rep2.sorted.mapped.bam
Command exit status:
1
Command output:
(empty)
Command error:
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NR_036686'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NR_036687'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NR_027000'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NR_027000'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NR_003668'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NR_046343'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NM_001282478'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NM_001282479'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NM_001037668'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NM_001040705'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NM_001040703'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NM_152250'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NM_001040704'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NM_152251'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NM_001040702'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NM_080389'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NM_058203'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NM_018661'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NM_001081551'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NM_001037804'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NM_001195257'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NR_024044'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NR_049875'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NR_049876'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NR_106988'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NR_003594'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NR_003594'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NR_003594'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NR_003594'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NR_003594'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NR_003594'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NR_003594'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NR_003594'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NR_003594'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NR_106988'
[W::sam_hdr_parse] duplicated sequence 'hg38_refGene_NR_031636'
Genome: 1
Key Length: 13
Max Indel: 200000
Minimum Score Ratio: 0.55888
Mapping Mode: normal
Exception in thread "main" java.lang.AssertionError:
The number of reads out does not add up to the number of reads in.
This may indicate that a mapping thread crashed.
If you submit a bug report, include the entire console output, not just this error message.
5+10+0+14+0 = 29 != 30
at align2.AbstractMapper.printOutput(AbstractMapper.java:1874)
at align2.BBMap.testSpeed(BBMap.java:490)
at align2.BBMap.main(BBMap.java:34)
Work dir:
/project/SCCC/DOrso_lab/s418665/work/e3/912b57b2e499791478a55a0b864356
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
-- Check '.nextflow.log' file for details
Workflow completed on: Mon May 15 10:05:42 CDT 2017
Execution status: Failed
Workflow Duration: 13m 20s
I notice that you are not assigning memory (
-Xmx30g
) with yourbbmap.sh
mapping command. I am not sure how much memory your cluster allows by default but for human genome alignment you are going to need at least 30G (add-Xmx30g
). You would need to add corresponding--mem
option on SLURM side to match.Do you have duplicate sequence headers in your genome fasta?
The problem appears to be due to a corrupted FASTA file. Using a new reference genome file seemed to fix the problem.