Question

Malformed BAM file produced by BWA

0

Entering edit mode

8.3 years ago

joreamayarom ▴ 140

I'm aligning reads to a genome to later do some variant calling. I have mapped some reads to a genome using bwa and created the corresponding dictionary

bwa index -p ref.fa ref.fa
bwa mem ref.fa forward.fastq.gz reverse.fastq.gz -t 16 > mapped_file.sam
samtools view -bS mapped_file.sam -o mapped_file.bam
java -jar picard.jar CreateSequenceDictionary \
R=ref.fa \
O=ref.dict

samtools faidx ref.fa

And then proceeded to sort and remove duplicates

java -jar picard.jar SortSam \
I=mapped_file.bam \
O=sorted_file.bam \
SORT_ORDER=coordinate

java -jar picard.jar MarkDuplicates \
REMOVE_DUPLICATES=true \
ASSUME_SORT_ORDER=coordinate \
I=sorted_file.bam \
O=clean_file.bam \
M=metric_file.txt

And finally tried to realign the sequences using GATK, but the programmed complained that my input bam file is malformed.

java -jar GenomeAnalysisTK.jar \
    -T RealignerTargetCreator \
    -R ref.fa \
    -I clean_file.bam \
    -o clean_file.intervals

I ran ValidateSamFile, as GATK recommends, and got this error

## HISTOGRAM    java.lang.String 
Error Type  Count 
ERROR:MISSING_READ_GROUP    1
WARNING:RECORD_MISSING_READ_GROUP   324404

To solve this problem I tried running AddOrReplaceReadGroups and FixMateInformation on my clean_file.bam, as the GATK webpage recommends (http://gatkforums.broadinstitute.org/gatk/discussion/7571/errors-in-sam-bam-files-can-be-diagnosed-with-validatesamfile), to no avail. I also ran ValidateSamFile directly in my Bra output and got a similar error message. Is there an explanation for this problem? Has my BWA output been corrupted in any stage?

GATK Samtools Picard • 3.9k views

ADD COMMENT • link updated 8.3 years ago by Pierre Lindenbaum 166k • written 8.3 years ago by joreamayarom ▴ 140

0

Entering edit mode

The RG tag has to be added manually, so it's not really a "corrupt" BAM... you just haven't added RG metadata and GATK needs that. AddOrReplaceReadGroups is the answer. Why didn't it, er, "avail"? :)

ADD REPLY • link 8.3 years ago by John 13k

score 2 · Answer 1 · 2017-04-25

2

Entering edit mode

8.3 years ago

Pierre Lindenbaum 166k

1) one step (don't save as SAM but bam, add your read groups and sort)

bwa mem -R '@RG\tID:S1\tSM:SAMPLE1'  ref.fa forward.fastq.gz reverse.fastq.gz -t 16 |  samtools sort -O BAM -o out.bam -T tmp -

2) when using the picard tools use the option

VALIDATION_STRINGENCY=LENIENT

ADD COMMENT • link 8.3 years ago by Pierre Lindenbaum 166k

1

Entering edit mode

I'm checking GATK documentation right now. It is mentioned that the -M flag is essential for Picard compatibility. Is that missing from your answer or you have an specific reason to leave it out?

ADD REPLY • link 8.3 years ago by joreamayarom ▴ 140

0

Entering edit mode

GATK/-M option ? what are you talking about ? VALIDATION_STRINGENCY is a general option for picard when handling bam.

ADD REPLY • link 8.3 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

I'm talking about recommendation 1. In GATK webpage (http://gatkforums.broadinstitute.org/gatk/discussion/2799) they recommend the following command with an -M before -R.

bwa mem -M -R ’<read group info>’ -p reference.fa raw_reads.fq > aligned_reads.sam

ADD REPLY • link 8.3 years ago by joreamayarom ▴ 140

0

Entering edit mode

1) the option -M has nothing to do with your problem. Check the documentation

2) the order of the options doesn't matter.

ADD REPLY • link 8.3 years ago by Pierre Lindenbaum 166k