GATK HaplotypeCaller combine info from two BAM into one line in vcf (not divide into samples column)
0
0
Entering edit mode
21 months ago
kamanovae ▴ 100

Hi I run the GATK HaplotypeCaller and hope to get a file where each sample will have a column.

My bam file looks like this:

input_bam/SRR8859080.bam
input_bam/ENCFF477JTA_new.bam

This is my GATK command:

allele_chunk_file=rs_coord.vcf
gatk_run_line="../bin/gatk-4.1.2.0/gatk"
outfile=wgs_test_out.genotypes.vcf
bam_file=wgs_test.bam.list
genome_seq="../hg38.fa"
intervals=wgs_test.bed

$gatk_run_line \
 HaplotypeCaller\
 --reference $genome_seq \
 --input $bam_file \
 --genotyping-mode GENOTYPE_GIVEN_ALLELES \
 --alleles $allele_chunk_file \
 --intervals $intervals  \
 --output  $outfile

As a result I get a vcf file like this(this is only first three position):

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  TUMOR
chr5    33987450        .       N       C       0       LowQual AC=0;AF=0.00;AN=2;DP=38;ExcessHet=3.0103;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;SOR=0.693 GT:AD:DP:GQ:PL  0/0:38,0:38:99:0,114,1404
chr5    33994716        .       N       C       0       LowQual AC=0;AF=0.00;AN=2;DP=40;ExcessHet=3.0103;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;SOR=0.287 GT:AD:DP:GQ:PL    0/0:39,0:39:99:0,117,1348
chr6    341321  .       C       T       0       LowQual AC=0;AF=0.00;AN=2;DP=40;ExcessHet=3.0103;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;SOR=0.269 GT:AD:DP:GQ:PL  0/0:40,0:40:99:0,120,1873

I have one column TUMOR for two samples. But by running the HaplotypeCaller separately for each file, I get such information.

SRR8859080.bam

chr5    33987450        .       N       C       0       LowQual AC=0;AF=0.00;AN=2;DP=13;ExcessHet=3.0103;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;SOR=0.368 GT:AD:DP:GQ:PL  0/0:13,0:13:39:0,39,323
chr5    33994716        .       N       C       0       LowQual AC=0;AF=0.00;AN=2;DP=17;ExcessHet=3.0103;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;SOR=0.495 GT:AD:DP:GQ:PL  0/0:16,0:16:48:0,48,456
chr6    341321  .       C       T       0       LowQual AC=0;AF=0.00;AN=2;DP=5;ExcessHet=3.0103;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;SOR=0.027  GT:AD:DP:GQ:PL  0/0:5,0:5:15:0,15,220




ENCFF477JTA_new.bam


 chr5    33987450        .       N       C       0       LowQual AC=0;AF=0.00;AN=2;DP=25;ExcessHet=3.0103;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;SOR=0.495  0/0:25,0:25:75:0,75,1081
    chr5    33994716        .       N       C       0       LowQual AC=0;AF=0.00;AN=2;DP=23;ExcessHet=3.0103;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;SOR=0.095  0/0:23,0:23:69:0,69,892
    chr6    341321  .       C       T       0       LowQual AC=0;AF=0.00;AN=2;DP=35;ExcessHet=3.0103;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;SOR=0.382 GT:AD:DP:GQ:PL  0/0:35,0:35:99:0,105,1653

The numbers in the last columns of two samplesis the sum of numbers in the last column of the first example of vcf. But I want to get a vcf with a column for each sample. I would be grateful for any hint!

HaplotypeCaller GATK VCF • 550 views
ADD COMMENT
0
Entering edit mode

please, acknowledge people's answers. kamanovae?active=posts

ADD REPLY

Login before adding your answer.

Traffic: 2683 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6