Hi all, I get an GATK error
SAM/BAM file SAMFileReader{/merged_bam_markdup.bam} is malformed: Read HWI-ST303_0093:5:5:13416:34802#0
is either missing the read group or its read group is not defined in the BAM header, both of which are
required by the GATK.
when I ran
java -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T RealignerTargetCreator -R ref.fa -I
merged_bam_markdup.bam -o read.intervals
My Bam file has a header
samtools view -h merged_bam_files_indexed_markduplicate.bam | grep ^@RG
@RG ID:test1 PL:Illumina PU:HWI-ST303 LB:test PI:75 SM:test CN:japan
@RG ID:test2 PL:Illumina PU:HWI-ST303 LB:test PI:75 SM:test CN:japan
A grep of the read within the error
HWI-ST303_0093:5:5:13416:34802#0 99 1 1090 29 23S60M17S =
1150 160
TGTTTGGGTTGAAGATTGATACTGGAAGAAGATTAGAATTGTAGAAAGGGGAAAACGATGTTAGAAAGTTAATACGGCTTACTCCAGATCCTTGGATCTC
GGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGFGGGGGGGGGDGFGFGGGGGFEDFGEGGGDGEG?FGGDDGFFDGGEDDFFFFEDG?E
MD:Z:60 PG:Z:MarkDuplicates RG:Z:test1 XG:i:0 AM:i:29 NM:i:0 SM:i:29 XM:i:0 XO:i:0 XT:A:M
Following GATK recommended solution using Picard
java -XX:MaxDirectMemorySize=4G -jar picard-tools-1.85/AddOrReplaceReadGroups.jar I= test.bam O=
test.header.bam SORT_ORDER=coordinate RGID=test RGLB=test RGPL=Illumina RGSM=test/ RGPU=HWI-ST303
RGCN=japan CREATE_INDEX=True
Got this error
Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record
12247781, Read name HWI-ST303_0093:5:26:10129:50409#0, MAPQ should be 0 for unmapped read.
I also tried
java -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference.fa
-I merged_bam_files_indexed_markduplicate.bam -o reads.intervals --validation_strictness LENIENT
but I got the first error message
Any suggestions would be great ?
N.B: I used to ran the same command (first one) with GATK version 1.2 and did not result in error
I also tried
picard/MarkDuplicates.jar I=test.bam O=test_markduplicate.bam M=test.matrix AS=true
VALIDATION_STRINGENCY=LENIANT
then indexed with samtools,then I got the following error
Ignoring SAM validation error: ERROR: Record (number), Read name HWI-ST303_0093:5:5:13416:34802#0,
RG ID on SAMRecord not found in header: test1
My pipeline
bwa aln -q 20 ref.fa read > files.sai
bwa sampe ref.fa file1.sai file2.sai read1 read2 > test1.sam
samtools view -bS test1.sam | samtools sort - test
samtools index test1.bam
samtools merge -rh RG.txt test test1.bam test2.bam
thanks ashutoshmits, will that fix the problem ?
Use Picard for merging. Try to use Picard for all BAM operations, it does a better job than samtools with the headers / metadata and produces BAM files that are more compliant with GATK. Picard and GATK are both developed maintained at the Broad Institute.
I am not sure about it but you can give it a try. As William mentioned that Picard does a better job then samtools for merging the bam files.