I am trying to call variants on my DNA-seq information. However, I continue to get stuck in the same step where I sort my bam file using Picard. When I use "du -sh *" to see the file size, the sorted.bam file that gets produced has nothing in it.
This is the code I am using on my bam files to create my sorted bam file.
#!/bin/bash
#SBATCH -J sc1_10bamSORT
#SBATCH -A gts-rro
#SBATCH -N 1 --ntasks-per-node=24
#SBATCH --mem-per-cpu=8G
#SBATCH -t 36:00:00
#SBATCH -o Report-%j.out
cd $SLURM_SUBMIT_DIR
ml picard/3.0.0
java -jar /usr/local/pace-apps/manual/packages/picard/3.0.0/build/libs/picard.jar SortSam -I /storage/RGBam/FR04_SC1_10.bam -O /storage/Sorted/FR04_SC1_10.sorted.bam --SORT_ORDER coordinate
In my pipeline, 1. I concatenate my fastq.gz files, 2. I trim them using Trimmomatic, 3. I create my index using the reference genome and Bwa-Mem2, 4. I map my reads using Bwa-Mem2 to produce my .sam files, 5. I use SAMtools to convert my .sam files to .bam files and add read groups, LASTLY, my problem step, 6. I sort my bam files using Picard to generate my sorted.bam files needed to mark duplicates using Picard.
The first time I did this, I had 15 files have this problem. This time I have 10 files with this problem. They are the same files but somehow 5 got resolved this round. How can I assure myself that what has been “resolved” consists of high quality data? And what could be causing problems when I sort my bam files?
Hi,
First, have you checked if the unsorted bam files have data in them ? There might be an error in the previous step.
There could be many reasons why the file is empty, do you have any log files or error messages from your commands ?
Since you are using a workload manager (slurm), another thing to verify is that you are requesting enough time and ressources for your analysis.
Yes the unsorted bam files have data in them and they were all of comparable size. I went back a few steps and this is the case for my sam and fastq files. So everything seems consistent there.
In previous runs I have also tried increasing my time and resources but was running into this same problem.
When I checked the logs is said these files had 0 paired reads, which is untrue because my paired.fastq files had data in them and still of comparable size to the other files.
What resolved this problem was using samtools to sort instead of Picard.