Hi,
I'm looking for a confirmation about what I'm doing, if correct.
1) I found several definitions of ID and PU and now I'm going to use this:
An example: @A00155:140:HHTKFDSXX:1:1101:3423:1000 1:N:0:CAGTGACT+CGAGGCGT
PU1=A00155 ### instrument
PU2=140 ### run
FL=HHTKFDSXX ### flowcell
LN=1 ### lane
LB ### library ID
ID=${PU1}.${PU2}
PU=${FL}.${LN}.${PU2}
2) when I have multilane data I saw that LB is important for the MarkDuplicates step and ID/PU for BQSR.
For MarkDuplicates I have to give as input all bam files from the same LB (of the same sample), correct?
When I have two libraries for the same sample I perform MarkDuplicates for each library and then I give as input both files (outputs of MarkDuplciates) at BQSR, correct?
Many thanks for your time!