Hi, I would like to take a vcf file and output a tab-delimited file with a line per individual-site that includes the individual, site, GT, alternate and reference alleles, DP, and each PL. It doesn't matter what order sites end up being in.
Any suggestions on how to do this most efficiently and/or any links to code that does something like this?
Example input:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT samp1 samp2
chr1 100 . C T 3106.72 SnpCluster . GT:AD:DP:GQ:PL 0/0:1,0:1:3:0,3,42 0/0:3,0:3:9:0,9,132
chr1 120 . C G 3106.72 SnpCluster . GT:AD:DP:GQ:PL 0/1:3,1:4:30:30,0,123 1/1:0,1:1:3:45,3,0
Example output:
samp1 chr1 100 0/0 C T 1 0 3 42
samp2 chr1 100 0/1 C T 4 30 0 123
samp1 chr1 120 0/0 C G 3 0 9 132
samp2 chr1 120 1/1 C G 1 45 3 0
If you're into python, pyvcf makes this sort of task quite easy.