Build RG information for BWA
1
0
Entering edit mode
8 weeks ago
hpapoli ▴ 150

Hello,

There are many posts about building the read group information but I haven't found a complete one yet. For a sample called, Sample_A, with a fastq header from sample_A.R1.fastq.gz, as follows:

@A00181:639:HNTFMDSX5:2:1101:1018:1000 1:N:0:ACACTAAG+TTATGGAT

Is this ReadGroup correct?

@RG\\tID:HNTFMDSX5.2\\tPL:ILLUMINA\\tLB:Sample_A\\tSM:Sample_A\\tPU:HNTFMDSX5.2.SampleA

HNTFMDSX5: Flowcell ID

2: Flowcell lane

I am mostly uncertain about the value for LB and the value for the third field of PU (PU= {FLOWCELL_BARCODE}.{LANE}.{SAMPLE_BARCODE})

Thanks so much for your help!

bwa • 427 views
ADD COMMENT
0
Entering edit mode
8 weeks ago

PL:Illumina must be uppercase to be compatible with gatk tools.

https://samtools.github.io/hts-specs/SAMv1.pdf

Platform/technology used to produce the reads. Valid values: CAPILLARY, DNBSEQ (MGI/BGI),
ELEMENT, HELICOS, ILLUMINA, IONTORRENT, LS454, ONT (Oxford Nanopore), PACBIO (Pacific Bio-
sciences), SINGULAR, SOLID, and ULTIMA. This field should be omitted when the technology is
not in this list (though the PM field may still be present in this case) or is unknown.
ADD COMMENT
0
Entering edit mode

thanks very much! What about the ID and PU field? Is the PU field correct as it is written now? Thanks again!

ADD REPLY
0
Entering edit mode

https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups should explain how to use read groups. This is a GATK requirement. Probably not needed for most other software.

ADD REPLY

Login before adding your answer.

Traffic: 2287 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6