I am trying to map reads onto a 270 Mb reference genome using bwa-mem. I took the ILLUMINA (2x150 bp) paired-end sequencing reads (~65 Gb .fastq files) and trimmed them using Trimmomatic 0.38 and then I mapped those trimmed sequences (~59 Gb .fastq files) onto the reference genome:
nohup bwa mem -M -t 60 -R "@RG\tID:3058874d-77f5-45a9-b511-69cf528d859e\tLB:lib1\tPL:ILLUMINA\tSM:KMM2\tPU:unit1\tPI:239" ~/Reference_genome.fasta KMM2_output_forward_paired_trimmed.fastq KMM2_output_reverse_paired_trimmed.fastq > KMM2_alignment.sam
This resulted in a 136 Gb sequence alignment .sam file. Then I marked duplicates using samblaster 0.1.24 resulting in a 136 Gb .sam file.
cat KMM2_alignment.sam | samblaster > KMM2_alignment_dupsmarked.sam
samblaster: Loaded 894 header sequence entries. samblaster: Marked 3370494 of 174296648 (1.93%) read ids as duplicates using 381692k memory in 7M42S(462.225S) CPU seconds and 6H8M31S(22111S) wall time.
Next I converted .sam to .bam resulting in a 30 Gb .bam file:
nohup samtools view -bS -h -@ 50 KMM2_alignment_dupsmarked.sam > KMM2.bam &
[samopen] SAM header is present: 894 sequences.
But when I went to sort the bam files:
samtools sort -@ 32 KMM2_09252018.bam KMM2_sorted.bam
I got this error:
[bam_header_read] invalid BAM binary header (this is not a BAM file). Segmentation fault (core dumped)
Please help. I read forums and tried to add the header in and read stuff that maybe my alignment .bam file is truncated. What did I do wrong?
if you absolutely have to use nohup, use it this way:
nohup sh -c 'bwa ......'
This would initiate a shell for the command enclosed.
The main error as another comment pointed out is that nohup is also redirecting stderr(?) to the sam file.
I tried to redo the bwa mem alignment without nohup and and ended up with the same size alignment .sam file (136 Gb)
and when I use check the alignment using samtools flagstat
File sizes are no indication of their correctness or similarity.
Versions of BWA and SAMtools used? What is the output of:
Oh thanks for the info about file sizes! /opt/bwa-0.7.12/bwa samtools I think is 0.1.17
samtools in installed globally on the server and I've asked the server peeps to update it. Could that be my problem?
Yes, this samtools is really old, and there is a bug that affects old versions:
https://github.com/lh3/samtools/pull/7
Your bwa is also quite old, having them both updated is advisable.
I updated bwa and samtools and ran everything in sbatch scripts and it's working. Thanks!!
(bio) [kmmahan@biota KMM1_analysis]$ cat KMM1_alignment_stats_09282018.txt 346964372 + 0 in total (QC-passed reads + QC-failed reads) 4419444 + 0 secondary 0 + 0 supplementary 28785855 + 0 duplicates 342043484 + 0 mapped (98.58% : N/A) 342544928 + 0 paired in sequencing 171272464 + 0 read1 171272464 + 0 read2 324671918 + 0 properly paired (94.78% : N/A) 336322728 + 0 with itself and mate mapped 1301312 + 0 singletons (0.38% : N/A) 11116498 + 0 with mate mapped to a different chr 8145437 + 0 with mate mapped to a different chr (mapQ>=5)
I redid the alignment without nohup and ended up with same issues.