A Error About Gatk Samfilereader Malformed
1
1
Entering edit mode
12.3 years ago
huboqiang ▴ 10

I met some questions when using GATK -UnifiedGenoTyper to call SNPs. The code and Error message like this:

 java -jar /WPS/BP/huboqiang/software/GenomeAnalysisTK-1.6-13-g91f02df/GenomeAnalysisTK.jar -T UnifiedGenotyper -I s_6_HW.sorted.bam -R ../hg19.2.fa -o my.vcf --output_mode EMIT_ALL_SITES

##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/WPS/BP/huboqiang/data/BamResult/headerRG.sorted.bam} is malformed: Read HWUSI-EAS535_0025:6:10:13898:5547#0 is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK.  Please use http://www.broadinstitute.org/gsa/wiki/index.php/ReplaceReadGroups to fix this problem

However, this website has been closed so I do not know how to deal with that problem. The Reads mentioned above is:

HWUSI-EAS535_0025:6:1:15164:14102#0     99      chr1    12096   0       76M     =       12243   223     TAGAGTGGGATGGGCCATTGTTCATCTTCTGGCCCCTGTTGTCTGCATGTAACTTAATACCACAACCAGGCATAGG    hhhfhhhhhhfhhghffhhgfhhhhhghhehhhhhghhhhgghhghhhhgahghhhhhhfhhghghhhhfehhQd]    XT:A:R  NM:i:0  SM:i:0  AM:i:0  X0:i:7  X1:i:1  XM:i:0  XO:i:0  XG:i:0 MD:Z:76
HWUSI-EAS535_0025:6:1:15164:14102#0     147     chr1    12243   0       76M     =       12096   -223    GCTCATCTCCTTGGCTGTGATACGTGGCCGGCCCTCGCTCCAGCAGCTGGACCCCTACCTGCCGTCTGCTGCCATC    Qa]aaW^X^\[Rb[_W]dcddb]deaadgggaf_a]da_Wffcc`fdfdf]dfdffefffcdcgfggagggegggg    XT:A:R  NM:i:0  SM:i:0  AM:i:0  X0:i:3  X1:i:5  XM:i:0  XO:i:0  XG:i:0 MD:Z:76

And this is a part of the head of the bam file:

@HD     VN:1.0
@SQ     SN:chr1 LN:249250621
@RG     ID:s_6.bam2     PL:illumina     PU:HWUSI-EAS535_0025    SM:s_6.bam
@PG     ID:bwa  PN:bwa  VN:0.6.1-r104

As it mentioned "Read HWUSI-EAS535_0025:6:10:13898:5547#0 is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK.", I do not know what's wrong with the read group @RG or with the BAM header.

Thanks.

gatk bam • 4.7k views
ADD COMMENT
3
Entering edit mode
12.3 years ago
Arun 2.4k

Your header seems to be right with @RG. However, I don't see the corresponding ID on the read.

For example: This is a header from a bam file I created.

@HD VN:1.0  GO:none SO:coordinate
@SQ SN:C1   LN:304671
@SQ SN:C2   LN:196989
@SQ SN:C3   LN:459830
@SQ SN:C4   LN:18056
@SQ SN:C5   LN:65502
@RG ID:1    PL:illumina  PU:TopHat  LB:S_6_1    SM:U_xy

And my reads are like this:

HWI-EAS431_0038:2:30:10474:19256#0  16  C1  6999    255 71M87N7M    *   0   0   CACCAAGAATCTAATCAAGTGCGCGATTAAGCATACGGCTATGCATCTGGTCATTGTTGATTCAGTCATCACTTTAAG  ECEDDF?GGFDGGDGGFGFCGEGBGFGFFGGGGDGGGGEGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG  MD:Z:78 RG:Z:1  XG:i:0  NH:i:1  NM:i:0  XM:i:0  XO:i:0  AS:i:-1 XS:A:-

Note the RG:Z:1 column, which you don't have. I think this is the problem. Do you use picard tools to add read groups? Or you just add them to your bam header?

ADD COMMENT
1
Entering edit mode

You could use Picard AddOrReplaceReadGroups as follows:

java -Xmx4g -jar AddOrReplaceReadGroups.jar -I s_6_HW.sorted.bam -O s_6_HW.sorted.rg.bam RGID=s_6 RGPL=illumina RGPU=HWUSI-EAS535_0025 RGSM=s_6
ADD REPLY
0
Entering edit mode

I'm sorry Matt I had a hard time logging in this website these days. Thanks, it really works!

ADD REPLY

Login before adding your answer.

Traffic: 1735 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6