What'S The Meaning Of @Rg In Sam ?
3
7
Entering edit mode
13.4 years ago

Hi all,

Can you tell me what's the meaning of @RG / group in the SAM format. Can I consider it as the very same thing as 'Sample-name' ?

Is there a case where one could find more than one @RG tag in a BAM file ?

Can mpileup (or another software) use this @RG flag to put more than one sample column in a VCF file.

If yes, what does it mean for the QUAL column ?

If no, under which condition can we find more than one sample column in a VCF ?

Thanks,

Pierre

sam bam next-gen sequencing format • 17k views
ADD COMMENT
10
Entering edit mode
13.4 years ago
lh3 33k

For Illumina reads, RG typically groups reads from a lane. GATK requires/assumes such a use in several components. At least one should not put reads from multiple libraries in the same RG.

When reading the input BAMs, samtools tries to group reads based on their @RG-SM tag. If there are no @RG lines, it implicitly inserts the following line into the header:

@RG ID:filename SM:filename

The VCF spec is very clear about the definition of QUAL: the probability of there being no SNPs in any samples.

ADD COMMENT
4
Entering edit mode
13.4 years ago

You can't always consider @RG the same as sample name. For example, when we do high-coverage genomes, we often have to create several libraries to get enough DNA, and each of those libraries may be sequenced across multiple runs. At the end of the process, we want to lump all of these reads together to make variant calls. The @RG tag allows us to combine all of these reads (from the same sample) into a single bam, while still retaining information on the source of each read.

As drio says, there's an SM tag within the field that corresponds to sample name.

In multi-sample VCFs, you shouldn't be using the QUAL field for the quality of the SNP calls. You should define a FORMAT field to store it - I've often seen VAQ for "variant quality". That way, each sample has an independent score at each site.

ADD COMMENT
1
Entering edit mode
13.4 years ago
Drio ▴ 920

Hi, here you have my answers for 1 and 2:

  1. @RG allows you to group together a set of reads within a (s|b)am file. In the @RG you have the SM tag to specify sample information.

  2. Yes. Imagine you are working in a project and need a lot of sequence coverage. You typically want to create more than one library to avoid PCR duplication issues. The LB tag in @RG for each library would be different. Once you merge your alignments, you'll see in the header multiple @RG entries.

ADD COMMENT

Login before adding your answer.

Traffic: 2055 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6