Do read group names have effect on marking duplicates using GATK?
0
0
Entering edit mode
3.8 years ago
halo22 ▴ 300

I've been trying to analyze an older WGS dataset of FASTQ's. I aligned the FASTQ's using BWA-mem to the hg38 reference genome, during this process I didn't add the read group information. I was working on marking duplicates using GATK but realized that a read group is essential for the process. Later, I look at my FASTQ's and counted to unique instrument names (sequence identifiers) E.g "@SXX191512" and there is more than one in for each FASTQ. I believe that the '@SXX191512' is required in creating a read group and adding it to the BAM files. Since there are multiple sequence identifiers, can I use anyone identifier to create a read group? Does this have any impact on duplicate markings?

Thanks.

Picard NGS WGS • 685 views
ADD COMMENT
1
Entering edit mode

If no answer here, you may consider asking on the GATK Support Forum, which, I assume, caters also for Picard.

ADD REPLY

Login before adding your answer.

Traffic: 2767 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6