GATK multi sample variant calling
1
0
Entering edit mode
5.3 years ago
cetin.m ▴ 50

I am trying to call SNPS from two bam files simultaneously. I want it to be written to the same vcf, with one column for each sample.

I run the following command:

./GenomeAnalysisTK.jar -T UnifiedGenotyper -I sampleA.bam -I sampleB.bam -R /mnt/NEOGENE1/share/ref/genomes/hsa/hs37d5.fa -L /mnt/NAS/projects/2018_MCetin_Selection/imputation/1000G_chr22.bed --output_mode EMIT_ALL_SITES --genotyping_mode GENOTYPE_GIVEN_ALLELES --alleles /mnt/NEOGENE1/share/dna/hsa/genotypes/1000G/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf -o output.vcf

The program runs without error, but when I look at the output file, it only contains one sample column, named sample1. (It is possibly sampleB.bam in the input, and sampleA calls without problems individually)

What am I doing wrong?

Thank you for reading!

gatk SNP variant calling • 1.3k views
ADD COMMENT
1
Entering edit mode
5.3 years ago

UnifiedGenotyper is deprecated, use HaplotypeCaller

it only contains one sample column, named sample1.

it's because you flagged your bams with the same read-group '@RG/SN:' same attribute named 'sample1' https://gatkforums.broadinstitute.org/gatk/discussion/6472/read-groups

one way to change this is to rename your samples using picard AddOrReplaceReadGroups: https://broadinstitute.github.io/picard/command-line-overview.html#AddOrReplaceReadGroups

ADD COMMENT
0
Entering edit mode

Makes a lot of sense!

ADD REPLY

Login before adding your answer.

Traffic: 2341 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6