Question

Read groups for multi-lane sample

0

Entering edit mode

2.9 years ago

shivangi.agarwal800 ▴ 120

Hi Guys,

I have 20 samples sequenced on 4 lanes (L1,L2,L3,L4) as paired end (R1 and R2). First, I concatenated all four lanes for R1 and same for R2 (L1,L2,L3 and L4 for R1 and L1,L2,L3,L4 for R2). Then, aligned R1.fastq and R2.fastq over reference genome and added read groups using samtools as:

> samtools addreplacerg -r '@RG\tID:2021_501\tLB:2021_501\tPL:ILLUMINA\tSM:2021_501' -m overwrite_all -o aln_2021-501_trimmed-RG.bam aln_2021-501_trimmed.bam

and checked it using command as:

> samtools view -H aln_2021-501_trimmed-RG.bam | grep '^@RG'

**@RG ID:2021_501 LB:2021_501 PL:ILLUMINA SM:2021_501**

Now, when I am running GATK using command as:

> java -jar $GenomeAnalysisTK  HaplotypeCaller -R GRCh37_latest_genomic.fa -I sorted_trimmed.bam -O sorted_trimmed.g.vcf -ERC GVCF

I am getting error as: A USER ERROR has occurred: Argument emit-ref-confidence has a bad value: Can only be used in single sample mode currently. Use the --sample-name argument to run on a single sample out of a multi-sample BAM file.

Here, GATK is recognizing multi-samples in bam but there is only one sample on 4 lanes in paired end mode. In that case, should I modify something regarding addition of read groups?. I am not sure about how to add read groups for multi-lane samples present in bam file.

Please respond. I will really appreciate your guidance.

group read • 1.0k views

ADD COMMENT • link updated 2.9 years ago by GenoMax 147k • written 2.9 years ago by shivangi.agarwal800 ▴ 120

0

Entering edit mode

What happens if you add --sample-name 2021_501_S8?

ADD REPLY • link 2.9 years ago by GenoMax 147k

0

Entering edit mode

what is the output of

samtools view -H sorted_trimmed.bam | grep '@RG' | tr "\t" "\n" | grep "^SM" | cut -d ':' -f 2 | sort | uniq

ADD REPLY • link 2.9 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Hi,

I observed read groups were not added to input file, sorted_trimmed.bam Then, I added read groups using command as

samtools addreplacerg -r '@RG\tID:2021_501_S8\tLB:2021_501_S8\tPL:ILLUMINA\tSM:2021_501_S8' -m overwrite_all -o sorted_trimmed.bam > 501-sorted_trimmed-RG.bam

samtools view -H 501-sorted_trimmed-RG.bam | grep '@RG'

@PG ID:samtools.1   PN:samtools PP:samtools VN:1.11 CL:samtools addreplacerg -r @RG\tID:2021_501_S8\tLB:2021_501_S8\tPL:ILLUMINA\tSM:2021_501_S8 -m overwrite_all -o 501-sorted_trimmed-RG.bam 501-sorted_trimmed.bam

@RG ID:2021_501_S8  LB:2021_501_S8  PL:ILLUMINA SM:2021_501_S8

samtools view -H 501-sorted_trimmed-RG.bam | grep '@RG' | tr "\t" "\n" | grep "^SM" | cut -d ':' -f 2 | sort | uniq

2021_501_S8

Is this right way to add read group to bam file, I mean all three parameters ID, LB and SM are set as 2021_501_S8 (ID:2021_501_S8, LB:2021_501_S8, SM:2021_501_S8) Should I provide as ID:flowcell-ID.flowcell-lane LB: (run_ID or something else?)

Also, should I include other parameters as well like PU.

Will this effect downstream analysis??

ADD REPLY • link updated 2.9 years ago by GenoMax 147k • written 2.9 years ago by shivangi.agarwal800 ▴ 120