Hey,
I downloaded a BAM file for chr20 from NCBI (SRR1976036). This is a NA12878 sample.
I wanted to do some variant calling with freebayes and got the following error.
could not find SM: in @RG tag
After some investigation i found that my BAM file does not have a RG tag.
@HD VN:1.2 SO:coordinate
@SQ SN:CM000663.1 LN:249250621
@SQ SN:CM000664.1 LN:243199373
@SQ SN:CM000665.1 LN:198022430
......
@SQ SN:GL000248.1 LN:39786
@SQ SN:GL000249.1 LN:38502
@RG ID:None
I looked around the internet for an answer and i though i found an answer using Picard function AddOrReplaceReadGroups.
So i tried the following
java -jar /home/user/Downloads/picard.jar AddOrReplaceReadGroups I=SRR1976036_chr20.bam O=036_RG.bam RGID=4 RGLB=lib1 RGPL=illumina RGPU=unit1 RGSM=test
However i got the following message:
Exception in thread "main" htsjdk.samtools.SAMFormatException: Error parsing SAM header. @RG line missing SM tag. Line: @RG ID:None; File /home/user/NA12878/SRR1976036_chr20.bam; Line number 95
I looked around and couldn't find an answer.
it is the first time working with a BAM directly. I have used variant calling from FASTQ to VCF files and never got this problem.
Could someone tell me what i can do to add the information properly?
Kind regards Covux
how do i remove the @RG line?
You can remove it from the header if it is only there (and not in the reads).
Then run Picard again on
your.new.bam
.Note: It will fail if you have malformed RG in each read. In that case, post some of the initial reads
I just your command line and now i get a different error.
here are some of the reads i have in my file
Try this on original BAM
The do the variant calling directly on your.new.bam (DON'T run Picard).
it worked!
would you mind telling me what the command line does? :)
Your read groups were malformed. In the actual reads, the
RG:Z:None
tag means that your sample name is"None"
. However, your header contains only ID tag in RG:@RG ID:None
(It must contain at least the SM=Sample tag). To reconcile, I added the sample info (and some other default info like PL=Platform=Illumina, SM=Sample etc.). You can see all details of @RG here https://software.broadinstitute.org/gatk/documentation/article.php?id=6472samtools view -H your.bam => take the header of BAM
sed 's,^@RG.*,@RG\tID:None\tSM:None\tLB:None\tPL:Illumina,g' => Replace line starting with @RG to @RG\tID:None\tSM:None\tLB:None\tPL:Illumina
samtools reheader - your.bam => make new header with above changes
Many thanks for your explanation!