Issues with mapping reads onto Reference genome using bwa mem
1
1
Entering edit mode
6.2 years ago

I am trying to map reads onto a 270 Mb reference genome using bwa-mem. I took the ILLUMINA (2x150 bp) paired-end sequencing reads (~65 Gb .fastq files) and trimmed them using Trimmomatic 0.38 and then I mapped those trimmed sequences (~59 Gb .fastq files) onto the reference genome:

nohup bwa mem -M -t 60 -R "@RG\tID:3058874d-77f5-45a9-b511-69cf528d859e\tLB:lib1\tPL:ILLUMINA\tSM:KMM2\tPU:unit1\tPI:239" ~/Reference_genome.fasta KMM2_output_forward_paired_trimmed.fastq KMM2_output_reverse_paired_trimmed.fastq > KMM2_alignment.sam

This resulted in a 136 Gb sequence alignment .sam file. Then I marked duplicates using samblaster 0.1.24 resulting in a 136 Gb .sam file.

cat KMM2_alignment.sam | samblaster > KMM2_alignment_dupsmarked.sam
samblaster: Loaded 894 header sequence entries.
samblaster: Marked 3370494 of 174296648 (1.93%) read ids as duplicates using 381692k memory in 7M42S(462.225S) CPU seconds and 6H8M31S(22111S) wall time.

Next I converted .sam to .bam resulting in a 30 Gb .bam file:

nohup samtools view -bS -h -@ 50 KMM2_alignment_dupsmarked.sam > KMM2.bam &
[samopen] SAM header is present: 894 sequences.

But when I went to sort the bam files:

samtools sort -@ 32 KMM2_09252018.bam KMM2_sorted.bam

I got this error:

[bam_header_read] invalid BAM binary header (this is not a BAM file).
Segmentation fault (core dumped)

Please help. I read forums and tried to add the header in and read stuff that maybe my alignment .bam file is truncated. What did I do wrong?

BWA-MEM alignment • 3.8k views
ADD COMMENT
0
Entering edit mode

if you absolutely have to use nohup, use it this way:

nohup sh -c 'bwa ......'

This would initiate a shell for the command enclosed.

The main error as another comment pointed out is that nohup is also redirecting stderr(?) to the sam file.

ADD REPLY
0
Entering edit mode

I tried to redo the bwa mem alignment without nohup and and ended up with the same size alignment .sam file (136 Gb)

bwa mem -M -t 100 -R "@RG\tID:3058874d-77f5-45a9-b511-69cf528d859e\tLB:lib1\tPL:ILLUMINA\tSM:KMM2\tPU:unit1\tPI:239" ~/Reference_genome.fasta KMM2_output_forward_paired_trimmed.fastq KMM2_output_reverse_paired_trimmed.fastq > KMM2_alignment_09272018.sam

and when I use check the alignment using samtools flagstat

samtools flagstat KMM2_alignment_09272018.sam
[bam_header_read] EOF marker is absent. The input is probably truncated. 
[bam_header_read] invalid BAM binary header (this is not a BAM file). 
[bam_flagstat_core] Truncated file? Continue anyway. 
0 + 0 in total (QC-passed reads + QC-failed reads) 
0 + 0 duplicates 
0 + 0 mapped (-nan%:-nan%) 
0 + 0 paired in sequencing 
0 + 0 read1 
0 + 0 read2 
0 + 0 properly paired (-nan%:-nan%) 
0 + 0 with itself and mate mapped 
0 + 0 singletons (-nan%:-nan%) 
0 + 0 with mate mapped to a different chr 
0 + 0 with mate mapped to a different chr (mapQ>=5)
  
ADD REPLY
0
Entering edit mode

I tried to redo the bwa mem alignment without nohup and and ended up with the same size alignment .sam file (136 Gb)

File sizes are no indication of their correctness or similarity.

Versions of BWA and SAMtools used? What is the output of:

head KMM2_alignment_09272018.sam
tail KMM2_alignment_09272018.sam
ADD REPLY
0
Entering edit mode

Oh thanks for the info about file sizes! /opt/bwa-0.7.12/bwa samtools I think is 0.1.17

samtools in installed globally on the server and I've asked the server peeps to update it. Could that be my problem?

ADD REPLY
0
Entering edit mode

Yes, this samtools is really old, and there is a bug that affects old versions:

https://github.com/lh3/samtools/pull/7

Your bwa is also quite old, having them both updated is advisable.

ADD REPLY
0
Entering edit mode

I updated bwa and samtools and ran everything in sbatch scripts and it's working. Thanks!!

(bio) [kmmahan@biota KMM1_analysis]$ cat KMM1_alignment_stats_09282018.txt 346964372 + 0 in total (QC-passed reads + QC-failed reads) 4419444 + 0 secondary 0 + 0 supplementary 28785855 + 0 duplicates 342043484 + 0 mapped (98.58% : N/A) 342544928 + 0 paired in sequencing 171272464 + 0 read1 171272464 + 0 read2 324671918 + 0 properly paired (94.78% : N/A) 336322728 + 0 with itself and mate mapped 1301312 + 0 singletons (0.38% : N/A) 11116498 + 0 with mate mapped to a different chr 8145437 + 0 with mate mapped to a different chr (mapQ>=5)

ADD REPLY
0
Entering edit mode

I redid the alignment without nohup and ended up with same issues.

ADD REPLY
0
Entering edit mode
6.2 years ago
h.mon 35k

I believe the problem is nohup is redirecting stderr to stdout, creating corrupted files. Look at these small tests, first without nohup:

bwa mem phiX R1.fastq R2.fastq > nonohup.sam
head -n2 nonohup.sam
@SQ   SN:NC_001422.1  LN:5386
@PG   ID:bwa  PN:bwa  VN:0.7.17-r1188 CL:bwa mem phiX R1.fastq R2.fastq
  

Now with nohup:

nohup bwa mem phiX R1.fastq R2.fastq > nohup.sam
head -n10 nohup.sam
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 12 sequences (1269 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 0, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 12 reads in 0.002 CPU sec, 0.002 real sec
@SQ   SN:NC_001422.1  LN:5386
@PG   ID:bwa  PN:bwa  VN:0.7.17-r1188 CL:bwa mem phiX R1.fastq R2.fastq
  
ADD COMMENT
0
Entering edit mode

bwa mem process is being redirected to nohup.sam, which will again throw error. Try to run bwa command in .sh file inside nohup. This way bwa process will be automatically redirected in nohup.out file and .sam file wont get corrupted.

ADD REPLY

Login before adding your answer.

Traffic: 1633 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6