Hi everyone,
Guys, I've been struggling with this bioinformatic and now I have a new problem. So, I checked this post Error parsing SAM header. @RG line missing SM tag. Line: @RG ID:None and It didn't work to me.
First I did the mapping:
bwa mem -M -R '@RG\tID:Sample1' -t 6 genome.ref Sample1_R1_forward_paired.trim.fastq.gz Sample1_R2_forward_paired.trim.fastq.gz | samtools view -hSb -o Sample1.bam -
Then I sorted by coordinate:
samtools sort Sample1.bam -o Sample1_sorted.bam
I removed duplicates:
java -Xms4g -jar picard.jar MarkDuplicates INPUT=Sample1_sorted.bam OUTPUT=Sample1_sorted_rmdup.bam METRICS_FILE=Sample1_sorted_rmdup.txt2 REMOVE_DUPLICATES=true VALIDATION_STRINGENCY=LENIENT
Filtered per quality:
samtools view -hSbq 30 -o Sample1_sorted_rmdup_qfilter.bam Sample1_sorted_rmdup.bam
Sooooo, when I tried to do BAM index, using this command:
java -Xms4g -jar picard.jar BuildBamIndex INPUT=Sample1_sorted_rmdup_qfilter.bam
I got:
Exception in thread "main" htsjdk.samtools.SAMFormatException: Error parsing SAM header. @RG line missing SM tag. Line: @RG ID:Sample1; File /path/to/BAM/Sample1.bam; Line number 1192
I checked the file, and it looks like this:
@HD VN:1.5 SO:coordinate @SQ SN:Chr01 LN:56831624 @SQ SN:Chr02 LN:48577505 @SQ SN:Chr03 LN:45779781 @SQ
SN:Chr04 LN:52389146 @SQ SN:Chr05 LN:42234498 @SQ
SN:Chr06 LN:51416486 @SQ SN:Chr07 LN:44630646 @SQ
SN:Chr08 LN:47837940 @SQ SN:Chr09 LN:50189764 @SQ
SN:Chr10 LN:51566898 @SQ SN:Chr11 LN:34766867 @SQ
SN:Chr12 LN:40091314 @SQ SN:Chr13 LN:45874162 @SQ
SN:Chr14 LN:49042192 @SQ SN:Chr15 LN:51756343 @SQ
SN:Chr16 LN:37887014 @SQ SN:Chr17 LN:41641366 @SQ
SN:Chr18 LN:58018742 @SQ SN:Chr19 LN:50746916 @SQ
SN:Chr20 LN:47904181 @SQ SN:scaffold_21 LN:3565126 @SQ
SN:scaffold_22 LN:1240113 @SQ SN:scaffold_23 LN:809636 @SQ
SN:scaffold_24 LN:735592 @SQ SN:scaffold_25 LN:750012 @SQ
SN:scaffold_26 LN:719293 @SQ SN:scaffold_27 LN:425344 @SQ
SN:scaffold_28 LN:367934 @SQ SN:scaffold_30 LN:374509 @SQ
SN:scaffold_31 LN:306967 @SQ SN:scaffold_32 LN:273180 @SQ
SN:scaffold_33 LN:367064 @SQ SN:scaffold_34 LN:312168 @SQ
SN:scaffold_35 LN:412299 @SQ SN:scaffold_36 LN:357887 @SQ
SN:scaffold_37 LN:303488 @SQ SN:scaffold_38 LN:280888 @SQ
SN:scaffold_39 LN:308105 @SQ SN:scaffold_40 LN:266805 @SQ
SN:scaffold_41 LN:255068 @SQ SN:scaffold_43 LN:313007 @SQ
SN:scaffold_44 LN:177731 @SQ SN:scaffold_47 LN:277228 @SQ
SN:scaffold_48 LN:336578 @SQ SN:scaffold_49 LN:240486 @SQ
SN:scaffold_50 LN:189765 @SQ SN:scaffold_51 LN:202321 @SQ
SN:scaffold_54 LN:193136 @SQ SN:scaffold_55 LN:182568
Then, I did:
samtools view -H Sample1_sorted_rmdup_qfilter.bam | sed 's,^@RG.*,@RG\tID:Sample1,g' | samtools reheader - Sample1_sorted_rmdup_qfilter.bam > Sample1_sorted_rmdup_qfilter_reheader.bam
And tried to run:
java -Xms4g -jar picard.jar BuildBamIndex INPUT=Sample1_sorted_rmdup_qfilter_reheader.bam
and I got the same:
Error parsing SAM header. @RG line missing SM tag. Line: @RG
ID:Sample1; File /path/to/BAM/Sample1_sorted_rmdup_qfilter_reheader.bam; Line number 1192
Could someone please help me again?
Thanks in advance.
Thank you, Pierre.
So, I am not sure if I did it right:
But I got an error while trying to run it:
What do you think I did wrong?
oh, even picard doesn't want your RG. So, a ID alone is not enough, you have to set it from the beginning using bwa....
Oh no! I already did like 30 samples mapping. :(... If there is no other way... Let's do it.
Thank you very much for your help, Pierre!
thinking, you could also use sed:
Thanks again, Pierre. But still didn't work.
ops typo
not
/^@PG/
but/^@RG/