I want to identify the @RG
info to add them to the BAM files. I am familiar with the situation where I have multiple libraries of the same sample or when we run the same library on multiple lanes. My question is about multiplexing.
Let us say I have 2 samples (S1 and S2). I prepared both samples at the same time for paired end sequencing 2x100 with 400bp insert size then indexed and pooled. Should give each sample a unique LB tag?
Then the pooled sample was sequenced on the same lane. I think I should give the sequence from each sample a different ID tag but the PU tag should be the same for both samples, right?
These are the header lines of the 1st read in the R1 file of each sample:
@D3VG1JS1:214:C7RNWACXX:1:1101:1128:1956 1:N:0:ATTCCT
@D3VG1JS1:214:C7RNWACXX:1:1101:1088:1987 1:N:0:GATCAG
My proposed read info for the 1st sample should be:
@RG\tID:@D3VG1JS1.214.C7RNWACXX.1.sample1\tSM:S1\tPL:Illumina\tLB:Lib1\tPU:@D3VG1JS1.214.C7RNWACXX.1
My proposed read info for the 2nd sample should be:
@RG\tID:@D3VG1JS1.214.C7RNWACXX.1.sample2\tSM:S2\tPL:Illumina\tLB:Lib2\tPU:@D3VG1JS1.214.C7RNWACXX.1