why CombineGCVFs in gatk not showing all the samples name?
0
0
Entering edit mode
19 months ago
rj.rezwan ▴ 10

Hi, I combined the 64 .vcf files using the CombineGVCFs in gatk. The command was completed successfully but its showing the output with only one column and why not rest of the 64 samples? The output is here

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample1
chr01   9883    .   A   C   1105.60 .   AC=1;AF=0.500;AN=2;BaseQRankSum=0.00;DP=2201;ExcessHet=3.0103;FS=1.813;MLEAC=1;MLEAF=0.500;MQ=59.98;MQRankSum=0.00;QD=33.50;ReadPosRankSum=-3.460e-01;SOR=1.302 GT:AD:DP:GQ:PGT:PID:PL:PS   0|1:6,27:33:99:0|1:9847_G_A:1113,0,1180:9847
chr01   9903    .   C   T   567.60  .   AC=1;AF=0.500;AN=2;BaseQRankSum=-1.910e-01;DP=2438;ExcessHet=3.0103;FS=10.281;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.00;QD=16.22;ReadPosRankSum=0.047;SOR=0.544   GT:AD:DP:GQ:PL  0/1:19,16:35:99:575,0,722
chr01   10056   .   C   T   1646.60 .   AC=1;AF=0.500;AN=2;BaseQRankSum=-5.220e-01;DP=2908;ExcessHet=3.0103;FS=2.244;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.00;QD=26.56;ReadPosRankSum=-1.900e-01;SOR=0.890   GT:AD:DP:GQ:PGT:PID:PL:PS   0|1:21,41:62:99:0|1:10056_C_T:1654,0,755:10056
chr01   10114   .   A   G   1185.60 .   AC=1;AF=0.500;AN=2;BaseQRankSum=0.00;DP=3067;ExcessHet=3.0103;FS=4.175;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.00;QD=22.80;ReadPosRankSum=0.207;SOR=1.496  GT:AD:DP:GQ:PGT:PID:PL:PS   0|1:22,30:52:99:0|1:10113_C_T:1193,0,1095:10113
chr01   10115   .   T   TGC 138.64  .   AC=1;AF=0.500;AN=2;BaseQRankSum=0.635;DP=3045;ExcessHet=3.0103;FS=6.896;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.00;QD=2.95;ReadPosRankSum=-1.438e+00;SOR=1.141 GT:AD:DP:GQ:PL  0/1:41,6:47:99:146,0,1628
chr01   10177   .   G   A   328.60  .   AC=1;AF=0.500;AN=2;BaseQRankSum=0.595;DP=3217;ExcessHet=3.0103;FS=1.707;MLEAC=1;MLEAF=0.500;MQ=59.98;MQRankSum=0.00;QD=14.94;ReadPosRankSum=0.057;SOR=1.179 GT:AD:DP:GQ:PGT:PID:PL:PS   0|1:14,8:22:99:0|1:10162_G_A:336,0,561:10162
chr01   10181   .   C   T   1144.60 .   AC=1;AF=0.500;AN=2;BaseQRankSum=-2.800e-02;DP=3238;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=59.98;MQRankSum=0.00;QD=31.79;ReadPosRankSum=0.134;SOR=0.997    GT:AD:DP:GQ:PL  0/1:8,28:36:99:1152,0,252
haplotypecalling CombineGVCFs GATK • 1.3k views
ADD COMMENT
0
Entering edit mode

what was the syntax of the CombineGVCFs command you ran?

ADD REPLY
0
Entering edit mode

here is the command

#!/bin/bash
#
#SBATCH --job-name=combine_files
#SBATCH --output=combine_files.%j.out
#SBATCH --err=combine_files.%j.err
#SBATCH --partition=batch
#SBATCH --cpus-per-task=20
#SBATCH --time=120:00:00
#SBATCH --mem=800G

module load gatk/4.1.2.0

ref_dir=(~/path/PitayaGenomic.fa)

gatk --java-options "-Xmx800g" CombineGVCFs -R ${ref_dir} --variant vcfs.list -O joint_files_update.g.vcf.gz

here is the name of a few vcf files belongs to the vcfs.list

American-Beauty_b010a1e6-dec2-4991-b662-a563e8b72621_[Guanhuabai]_marked.bam-RG.bam.g.vcf
American-Beauty_c6adacb45-fe6c-4f01-8bfd-320e2440a394_[Guanhuabai]_marked.bam-RG.bam.g.vcf
American-Beauty_e364df6f-28bd-47b0-b369-d7eb04811de5_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Asante-3_9c89e099-6112-429a-b250-a4be1452d917_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Ax_e6a90065-9be1-4e00-8b4b-df5b0bcedc7c_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Bloody-Mary_a0cd1d1c-54d4-464b-b242-882c7e7ca303_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Bloody-Mary_cc309f1f-fef6-40ad-ae58-b74813a2a641_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Cebra_831c7e57-8ac1-423a-96d3-f62cc2755ca7_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Condor_f6a54f12-5b0d-4dcb-941e-99f211e23a0d_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Cosmic-Charlie_40978c8f-37a9-4a76-bd78-94604ff70c5b_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Cosmic-Charlie_83852650-9d2c-41db-b29b-4f568bf1c3a3_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Cosmic-Charlie_a6b3a45a-0be2-48ae-8210-2cd91130585f_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Costa-Rican-Sunset_d7f595aa-bffd-4ea5-b7c0-75bfdc19f17f_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Country-Roads_f3a30626-2d19-427a-9e17-767630b74ef6_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Dark-Star_0f7e0c03-5bd3-42ad-b611-72605f4bdff4_[Guanhuabai]_marked.bam-RG.bam.g.vcf
ADD REPLY
0
Entering edit mode

Are all those files GVCFs? Can you show the output to:

xargs -a vcfs.list -I v_f bcftools query -l v_f
ADD REPLY
0
Entering edit mode

showing this output

sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
sample1
ADD REPLY
1
Entering edit mode

Now you know what's happening. All those VCF files have the same sample name so CombineGVCFs merely overwrites them. Use bcftools reheader -s to rename the sample in each VCF file and then run CombineGVCFs

ADD REPLY
0
Entering edit mode

One more help, I want the header name same as the file name, e.g., the name of multiple file name is following:

American-Beauty_b010a1e6-dec2-4991-b662-a563e8b72621_[Guanhuabai]_marked.bam-RG.bam.g.vcf
American-Beauty_c6adacb45-fe6c-4f01-8bfd-320e2440a394_[Guanhuabai]_marked.bam-RG.bam.g.vcf
American-Beauty_e364df6f-28bd-47b0-b369-d7eb04811de5_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Asante-3_9c89e099-6112-429a-b250-a4be1452d917_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Ax_e6a90065-9be1-4e00-8b4b-df5b0bcedc7c_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Bloody-Mary_a0cd1d1c-54d4-464b-b242-882c7e7ca303_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Bloody-Mary_cc309f1f-fef6-40ad-ae58-b74813a2a641_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Cebra_831c7e57-8ac1-423a-96d3-f62cc2755ca7_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Condor_f6a54f12-5b0d-4dcb-941e-99f211e23a0d_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Cosmic-Charlie_40978c8f-37a9-4a76-bd78-94604ff70c5b_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Cosmic-Charlie_83852650-9d2c-41db-b29b-4f568bf1c3a3_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Cosmic-Charlie_a6b3a45a-0be2-48ae-8210-2cd91130585f_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Costa-Rican-Sunset_d7f595aa-bffd-4ea5-b7c0-75bfdc19f17f_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Country-Roads_f3a30626-2d19-427a-9e17-767630b74ef6_[Guanhuabai]_marked.bam-RG.bam.g.vcf
Dark-Star_0f7e0c03-5bd3-42ad-b611-72605f4bdff4_[Guanhuabai]_marked.bam-RG.bam.g.vcf

So what should be the bcftools reheader -s command?

ADD REPLY
1
Entering edit mode

You should sanitize the names - [ is not a good choice in a file name. Why not use ..._Guanhuabai_... instead of ..._[Guanhuabai]_...? Also, using the full file name is probably not a good choice as sample names should be as short as possible. I'd recommend a compromise: use the part until before the _[.

As for the exact command, I think it'd be a good learning exercise for you to figure it out. First, create a file with the old and new names as specified in the manual - do this by hand if required, but a sed (or even a cut) should help you automate it. Once the file is ready, use bcftools reheader -s your_sample_name_mapping.file input.vcf | bcftools view -h | grep "^#CHROM" to see if it worked. Keep tweaking your_sample_name_mapping.file until it works and once it does, you should be able to use reheader's -o option to output to a file.

Another tip would be to rename the current files (from .vcf to say, .old.vcf) and output the new files to existing file names so you don't have to change vcfs.list.

ADD REPLY

Login before adding your answer.

Traffic: 2543 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6