Hi Guys,
I have 20 samples sequenced on 4 lanes (L1,L2,L3,L4) as paired end (R1 and R2). First, I concatenated all four lanes for R1 and same for R2 (L1,L2,L3 and L4 for R1 and L1,L2,L3,L4 for R2). Then, aligned R1.fastq and R2.fastq over reference genome and added read groups using samtools as:
> samtools addreplacerg -r '@RG\tID:2021_501\tLB:2021_501\tPL:ILLUMINA\tSM:2021_501' -m overwrite_all -o aln_2021-501_trimmed-RG.bam aln_2021-501_trimmed.bam
and checked it using command as:
> samtools view -H aln_2021-501_trimmed-RG.bam | grep '^@RG'
**@RG ID:2021_501 LB:2021_501 PL:ILLUMINA SM:2021_501**
Now, when I am running GATK using command as:
> java -jar $GenomeAnalysisTK HaplotypeCaller -R GRCh37_latest_genomic.fa -I sorted_trimmed.bam -O sorted_trimmed.g.vcf -ERC GVCF
I am getting error as: A USER ERROR has occurred: Argument emit-ref-confidence has a bad value: Can only be used in single sample mode currently. Use the --sample-name argument to run on a single sample out of a multi-sample BAM file.
Here, GATK is recognizing multi-samples in bam but there is only one sample on 4 lanes in paired end mode. In that case, should I modify something regarding addition of read groups?. I am not sure about how to add read groups for multi-lane samples present in bam file.
Please respond. I will really appreciate your guidance.
What happens if you add
--sample-name 2021_501_S8
?what is the output of
Hi,
I observed read groups were not added to input file, sorted_trimmed.bam Then, I added read groups using command as
Is this right way to add read group to bam file, I mean all three parameters ID, LB and SM are set as 2021_501_S8 (ID:2021_501_S8, LB:2021_501_S8, SM:2021_501_S8) Should I provide as ID:flowcell-ID.flowcell-lane LB: (run_ID or something else?)
Also, should I include other parameters as well like PU.
Will this effect downstream analysis??