Do read group names have effect on marking duplicates using GATK?

0

Entering edit mode

4.5 years ago

halo22 ▴ 300

I've been trying to analyze an older WGS dataset of FASTQ's. I aligned the FASTQ's using BWA-mem to the hg38 reference genome, during this process I didn't add the read group information. I was working on marking duplicates using GATK but realized that a read group is essential for the process. Later, I look at my FASTQ's and counted to unique instrument names (sequence identifiers) E.g "@SXX191512" and there is more than one in for each FASTQ. I believe that the '@SXX191512' is required in creating a read group and adding it to the BAM files. Since there are multiple sequence identifiers, can I use anyone identifier to create a read group? Does this have any impact on duplicate markings?

Thanks.

Picard NGS WGS • 775 views

ADD COMMENT • link 4.5 years ago by halo22 ▴ 300

1

Entering edit mode

If no answer here, you may consider asking on the GATK Support Forum, which, I assume, caters also for Picard.

ADD REPLY • link 4.5 years ago by Kevin Blighe 89k

Login before adding your answer.