Is there a easy method to extract the genotype information from the genotype field in a GATK generated vcf file?
I am looking at a output from a VCF file in the following format.
Chr Coordinate Allele1 Allele2 #Allele1 #Allele2
I have tried vcftools --vcf sample.vcf --extract-FORMAT-info <columnName> --out sample
This one just extracts column data similar to
vcf2bed < sample.vcf > sample_table.bed
cat sample_table.bed | awk '{print $10,$11}'| cut -d':' -f5 |awk '{print $2}' > GT.txt
Extracting AD-->
cat sample_table.bed | awk '{print $10,$11}'| cut -d':' -f6 > AD.txt
Extracting DP-->
cat sample_table.bed | awk '{print $10,$11}'| cut -d':' -f7 > DP.txt
Extracting GQ-->
cat sample_table.bed | awk '{print $10,$11}'| cut -d':' -f8 > GQ.txt
Extracting PL-->
cat sample_table.bed | awk '{print $10,$11}'| cut -d':' -f9 > PL.txt
Also, I have looked into vcflib and unable to find a suitable function to do the same.
it isn't clear to me what output do you exactly expect. here are some ideas:
vcf-to-tab
would docut -f1-4,10 file.vcf | sed 's/:/\t/g'
would doI want to split each GT columns field into their respective header to extract the fields
Ref Count (AD) Alt Count (AD) Second Alt (AD) Coverage (DP) QUAL Confidence (GQ) Homozygous Reference (PL) Heterozygous Reference (PL) Homozygous alternate (PL)
in separate columns.The script mentioned in the comment simply splits the GT columns field to extract the sub fields GT:AD:DP:GQ:PL into 5 columns. It does not give the separate columns for counts of reference, alternate alle etc. Hence, was looking for tool which can do the same as vcf-to-tab for GT column field.