I have 16 libraries that were multiplexed and sequenced on the same lane. When using AddOrReplaceReadGroups.jar, should I create a unique RGLB label for each library (e.g., lib1, lib2, lib3, etc.)? Apologies if this sounds like a silly question, but I have read that some folks use the same name (e.g., lib1) for the RGLB flag when libraries were sequenced on the same lane.
Oh well, I decided to give each library a unique ID. I did, however, keep sample and read group IDs (RGSM and RGID) for a given library, but obviously diff for each library, is that acceptable?
Giving each one a unique ID is perfectly fine :) As long as each sample has its own RGSM and RGID (assuming you only sequenced each sample once, which is likely) then you're good to go.
This is what a piece of my batch file looks like (paired and unpaired):
Are we on the same page?
We're now on neighbouring pages, but in the same chapter. Are the "-singles" (A) just orphans from trimming or did you (B) run the same libraries as single and then paired-end or (C) make different libraries, one for a single-end and the other for a paired-end run?
The proper method for each:
A. These should be in the same BAM file, with the same ID/SM/etc.
B. They should have the same RGSM and RGLB, but a different RGID.
C. They should have different RGLB and RGID, but the same RGSM.
In any case, your usage of RGLB is fine in the cases you showed.
Oh, these are just orphans from trimming (A.). For clarity, this is how I am doing things (please correct me if I am wrong):
Looks good, happy variant calling!
Ah, now I am confused. Basically I need help with this part of the command:
I=sample1.bam \
O=sample1-RG.bam \
RGLB=Lib1 \
RGID=sample1 \
RGSM=sample1 \
I=sample1-singles.bam \
O=sample1-singles-RG.bam \
RGLB=Lib1 \
RGID=sample1-singles \
RGSM=sample1-singles \
What should be different or the same? As mentioned, “singles” refers to orphans (reads that lost their mate). Whatever I do to this particular library, I will obviously carry out with the other 15 libraries.
For paired-end and orphans from the same sample and library, RGLB and RGSM should be identical. I would also make RGID the same, but that's because I'd put them in the same file. Obviously if they're in different files then they should have different RGIDs.
Awesome, thanks for clarifying. For paired-end and orphans from the same sample and library (like above example), I will make RGLB and RGSM the same (ie, RGSM=sample1, RGSM=sample1, RGLB=Lib1, RGLB=Lib1), but keep RGID different (i.e., RGID=sample1, RGID=sample1-singles).