Entering edit mode
19 months ago
rj.rezwan
▴
10
Hi, I combined the 64 .vcf
files using the CombineGVCFs
in gatk
. The command was completed successfully but its showing the output with only one column and why not rest of the 64 samples? The output is here
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1
chr01 9883 . A C 1105.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.00;DP=2201;ExcessHet=3.0103;FS=1.813;MLEAC=1;MLEAF=0.500;MQ=59.98;MQRankSum=0.00;QD=33.50;ReadPosRankSum=-3.460e-01;SOR=1.302 GT:AD:DP:GQ:PGT:PID:PL:PS 0|1:6,27:33:99:0|1:9847_G_A:1113,0,1180:9847
chr01 9903 . C T 567.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=-1.910e-01;DP=2438;ExcessHet=3.0103;FS=10.281;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.00;QD=16.22;ReadPosRankSum=0.047;SOR=0.544 GT:AD:DP:GQ:PL 0/1:19,16:35:99:575,0,722
chr01 10056 . C T 1646.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=-5.220e-01;DP=2908;ExcessHet=3.0103;FS=2.244;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.00;QD=26.56;ReadPosRankSum=-1.900e-01;SOR=0.890 GT:AD:DP:GQ:PGT:PID:PL:PS 0|1:21,41:62:99:0|1:10056_C_T:1654,0,755:10056
chr01 10114 . A G 1185.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.00;DP=3067;ExcessHet=3.0103;FS=4.175;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.00;QD=22.80;ReadPosRankSum=0.207;SOR=1.496 GT:AD:DP:GQ:PGT:PID:PL:PS 0|1:22,30:52:99:0|1:10113_C_T:1193,0,1095:10113
chr01 10115 . T TGC 138.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.635;DP=3045;ExcessHet=3.0103;FS=6.896;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.00;QD=2.95;ReadPosRankSum=-1.438e+00;SOR=1.141 GT:AD:DP:GQ:PL 0/1:41,6:47:99:146,0,1628
chr01 10177 . G A 328.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.595;DP=3217;ExcessHet=3.0103;FS=1.707;MLEAC=1;MLEAF=0.500;MQ=59.98;MQRankSum=0.00;QD=14.94;ReadPosRankSum=0.057;SOR=1.179 GT:AD:DP:GQ:PGT:PID:PL:PS 0|1:14,8:22:99:0|1:10162_G_A:336,0,561:10162
chr01 10181 . C T 1144.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=-2.800e-02;DP=3238;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=59.98;MQRankSum=0.00;QD=31.79;ReadPosRankSum=0.134;SOR=0.997 GT:AD:DP:GQ:PL 0/1:8,28:36:99:1152,0,252
what was the syntax of the
CombineGVCFs
command you ran?here is the command
here is the name of a few vcf files belongs to the
vcfs.list
Are all those files GVCFs? Can you show the output to:
showing this output
Now you know what's happening. All those VCF files have the same sample name so CombineGVCFs merely overwrites them. Use
bcftools reheader -s
to rename the sample in each VCF file and then run CombineGVCFsOne more help, I want the header name same as the file name, e.g., the name of multiple file name is following:
So what should be the
bcftools reheader -s
command?You should sanitize the names -
[
is not a good choice in a file name. Why not use..._Guanhuabai_...
instead of..._[Guanhuabai]_...
? Also, using the full file name is probably not a good choice as sample names should be as short as possible. I'd recommend a compromise: use the part until before the_[
.As for the exact command, I think it'd be a good learning exercise for you to figure it out. First, create a file with the old and new names as specified in the manual - do this by hand if required, but a
sed
(or even acut
) should help you automate it. Once the file is ready, usebcftools reheader -s your_sample_name_mapping.file input.vcf | bcftools view -h | grep "^#CHROM"
to see if it worked. Keep tweakingyour_sample_name_mapping.file
until it works and once it does, you should be able to usereheader
's-o
option to output to a file.Another tip would be to rename the current files (from
.vcf
to say,.old.vcf
) and output the new files to existing file names so you don't have to changevcfs.list
.