Question

Sam/Bam File Samfilereader

1

Entering edit mode

11.3 years ago

mad.cichlids ▴ 140

Hi, there

I was trying to add read groups to my alignments and take advantage of GATK to call variants. I first did my cleaned reads and added read groups to the new alignment with (just one example file, I have multiple alignments for the whole dataset) :

bwa aln -n 0.04 -k 5 -I -t 4 contig ~/Desktop/Test_dataset2/1_cold_trimmed_clipped.fastq | bwa samse -r '@RG\tID:1_cold\tSM:1_cold\tPL:Illumina' contig - ~/Desktop/Test_dataset2/1_cold_trimmed_clipped.fastq > ~/Desktop/Test_dataset2/auto_data_31/aligned_sam/1_cold.sam

Then I converted my generated sam file and sorted them with :

samtools view -bS 1_cold.sam | samtools sort -m 300000000 - ./sam_bam/1_cold_sorted

Then I replace the duplicates:

java -Xmx2g -jar  ~/Desktop/apps/picard-tools-1.92/MarkDuplicates.jar INPUT=1_cold_sorted.bam OUTPUT=1_cold_dedup.bam METRICS_FILE=1_cold_metricsfile MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=250 ASSUME_SORTED=true VALIDATION_STRINGENCY=SILENT REMOVE_DUPLICATES=True

Then merged all of the files after reducing duplicate:

samtools merge -rh RG.txt merged.bam *dedup.bam

 samtools index merged.bam 

 java -jar ~/Desktop/apps/picard-tools-1.92/CreateSequenceDictionary.jar R= contigs.fa o=contigs.dict

 samtools faidx contigs.fa

Then I was ready to realign with GATK:

java -Xmx2g -jar ~/Desktop/apps/GenomeAnalysisTK-2.6-5-gba531bd/GenomeAnalysisTK.jar \
  -T RealignerTargetCreator \
  -R contigs.fa \
  -o merged_output.intervals \
  -I ~/Desktop/Test_dataset2/auto_data_31/aligned_sam/sam_bam/merged.bam \
  --minReadsAtLocus 3

There was an error message:

##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/home/bq/Desktop/Test_dataset2/auto_data_31/aligned_sam/sam_bam/merged.bam} is malformed: Read HWI-ST141_0363:2:1101:11445:6328#TAGCTT uses a read group (5_hot_dedup) that is not defined in the BAM header, which is not valid.  Please use http://gatkforums.broadinstitute.org/discussion/59/companion-utilities-replacereadgroups to fix this problem

And I went back to the original file to check header: They were both there.

samtools view -h 5_hot_dedup.bam | grep "^@RG"
@RG    ID:5_hot    PL:Illumina    SM:5_hot

grep "^@RG" 5_hot.sam 
@RG    ID:5_hot    SM:5_hot    PL:Illumina

And I rerun the whole analysis from the original sequence, the same error occurred again, Could you please shed some light on where I could possibly do wrong? I had worked on this for nearly weeks, but little progress has been made. There might be some other options like in picard addreadgroups, but i really want to know what did i do wrong with the analysis? Thank you very much for any suggestions and sorry for the long post.

snps • 5.3k views

ADD COMMENT • link updated 11.3 years ago by Biostar 20 • written 11.3 years ago by mad.cichlids ▴ 140

0

Entering edit mode

in samtools merge -rh RG.txt what's the content of RG.txt ?

ADD REPLY • link 11.3 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Here is the content in RG.txt for the complete samples

    @RG    ID:1_cold    SM:1_cold    PL:Illumina
    @RG    ID:2_cold    SM:2_cold    PL:Illumina
    @RG    ID:3_cold    SM:3_cold    PL:Illumina
    @RG    ID:4_cold    SM:4_cold    PL:Illumina
    @RG    ID:5_cold    SM:5_cold    PL:Illumina
    @RG    ID:6_cold    SM:6_cold    PL:Illumina
    @RG    ID:1_hot    SM:1_hot    PL:Illumina
    @RG    ID:2_hot    SM:2_hot    PL:Illumina
    @RG    ID:3_hot    SM:3_hot    PL:Illumina
    @RG    ID:4_hot    SM:4_hot    PL:Illumina
    @RG    ID:5_hot    SM:5_hot    PL:Illumina
    @RG    ID:6_hot    SM:6_hot    PL:Illumina

ADD REPLY • link 11.3 years ago by mad.cichlids ▴ 140

score 1 · Answer 1 · 2013-08-02

1

Entering edit mode

11.3 years ago

Ashutosh Pandey 12k

The error clearly says that the read id associated with this read HWI-ST141_0363:2:1101:11445:6328#TAGCTT is 5_hot_dedup and this RG ID is not present in your header. Your header has 5_hot instead of 5_hot_dedup. so change your header accordingly.

ADD COMMENT • link 11.3 years ago by Ashutosh Pandey 12k

0

Entering edit mode

ok, here is what i did, i changed

@RG    ID:5_hot    SM:5_hot    PL:Illumina

to

@RG    ID:5_hot_dedup    SM:5_hot_dedup    PL:Illumina

then the same error occurred again:

ERROR MESSAGE: SAM/BAM file SAMFileReader{/home/bq/Desktop/Test_dataset2/auto_data_31/aligned_sam/sam_bam/merged.bam} is malformed: Read HWI-ST141_0363:2:1101:11445:6328#TAGCTT uses a read group (5_hot_dedup) that is not defined in the BAM header, which is not valid.  Please use http://gatkforums.broadinstitute.org/discussion/59/companion-utilities-replacereadgroups to fix this problem

I looked into this error:

samtools view -h merged.bam | grep "HWI-ST141_0363:2:1101:11445:6328#TAGCTT"
HWI-ST141_0363:2:1101:11445:6328#TAGCTT    16    NODE_1_length_332_cov_16.180723    1    37    50M    *    0    0    TCTCGGGTGGCCGGAACGTTTACTTTGAAAAAATTAGAGTGTTCAAAGCA    GHBIHGB@DGDJIGHEIIIIIHIJJIGJIIHGGIIGIHHHFHDFFFFCCC    X0:i:1    X1:i:0    MD:Z:50    PG:Z:MarkDuplicates    XG:i:0    NM:i:0    XM:i:0    XO:i:0    XT:A:U    RG:Z:5_hot_dedup

It seems the RG group is RG:Z:5_hot_dedup, but I never introduce this as my readgroup,

What I did for my bwa alignment and added the readgroup was

bwa aln -n 0.04 -k 5 -I -t 4 contig ~/Desktop/Test_dataset2/5_hot_trimmed_clipped.fastq | bwa samse -r '@RG\tID:5_hot\tSM:5_hot\tPL:Illumina' contig - ~/Desktop/Test_dataset2/5_hot_trimmed_clipped.fastq > ~/Desktop/Test_dataset2/auto_data_31/aligned_sam/5_hot.sam

so 5_hot_dedup was not not introduced as readgroup at the beginning, that is the part confused me. I really appreciate your time on my question, thanks a lot!

I also looked into my merged.bam file，the 5_dedup read group is not there either...

samtools view -h merged.bam | grep "^@RG"
@RG    ID:1_cold    SM:1_cold    PL:Illumina
@RG    ID:2_cold    SM:2_cold    PL:Illumina
@RG    ID:3_cold    SM:3_cold    PL:Illumina
@RG    ID:4_cold    SM:4_cold    PL:Illumina
@RG    ID:5_cold    SM:5_cold    PL:Illumina
@RG    ID:6_cold    SM:6_cold    PL:Illumina
@RG    ID:1_hot    SM:1_hot    PL:Illumina
@RG    ID:2_hot    SM:2_hot    PL:Illumina
@RG    ID:3_hot    SM:3_hot    PL:Illumina
@RG    ID:4_hot    SM:4_hot    PL:Illumina
@RG    ID:5_hot    SM:5_hot    PL:Illumina
@RG    ID:6_hot    SM:6_hot    PL:Illumina

ADD REPLY • link 11.3 years ago by mad.cichlids ▴ 140