Hello All,
I am trying to reanalyze a WGS dataset that was generated a few years ago. I've access to the old BAM files and I was able to create paired fastq's for each sample. Since I'll be using picard for marking duplicates I would like to add the read group information at the time of aligning my fastq's with BWA mem. From the old bam file, I was able to get lines matching the '@RG' lines. But there seems to be multiple read group ID present in the bam file. From the BWA documentation, it seems that the correct way of adding the read group info is by bwa mem -R '@RG\tID:foo\tSM:bar\tLB:library1'. I believe here both foo and bar are unique for each sample. For my particular case, how should multiple read group info to BWA? Hope the question is clear. I've very limited experience with WGS. I appreciate all your help and comments.
samtools view -H sampleA.bam | grep '^@RG'
@RG ID:AVKMG.3 SM:sampleA LB:0993462810_Illumina PL:ILLUMINA PU:AVKMGDSXX191015.3.GTCCACAG-CGCGAATA CN:BI DT:2016-10-15T04:00:00+0000 DS:KS-9108
@RG ID:AJJMK.4 SM:sampleA LB:0993462810_Illumina PL:ILLUMINA PU:AJJMKDSXX191014.4.GTCCACAG-CGCGAATA CN:BI DT:2016-10-14T04:00:00+0000 DS:KS-9108
@RG ID:AKKMD.4 SM:sampleA LB:0993462810_Illumina PL:ILLUMINA PU:AKKMDDSXX191014.4.GTCCACAG-CGCGAATA CN:BI DT:2016-10-14T04:00:00+0000 DS:KS-9108
@RG ID:UGGMD.4 SM:sampleA LB:0993462810_Illumina PL:ILLUMINA PU:UGGMDDSXX191014.4.GTCCACAG-CGCGAATA CN:BI DT:2016-10-14T04:00:00+0000 DS:KS-9108
Thank you very much. I do have a follow-up question and would appreciate it if you could answer this. I used picard AddreadGroup function and added a single read group, the first in the above example to the aligned BAM file. (All library LB are the same in the above example). I was successfully able to markduplicates. Do you think this is the correct approach? Does 'ID' have an impact on the markduplication process?
This seems sufficiently distinct from the original question for you to open a new post.