VCF to genotype file of specific format
3
0
Entering edit mode
8.8 years ago
David.shaw ▴ 10

I have a VCF per individual.

I'd like to take it, and convert it to the following format so that I have the alleles of the individual in the last two columns:

rsid chr pos ref alt

rs102 chr1 34 A G

Any package that does this already?

vcf awk bash genotypes • 4.1k views
ADD COMMENT
3
Entering edit mode
8.8 years ago

For REF and ALT in the last two columns:

bcftools query -f '%ID %CHROM %POS %REF %ALT\n' input.vcf

For the actual alleles of the individual in the last two columns:

bcftools query -f '%ID %CHROM %POS[ %TGT]\n' input.vcf | tr "/" " "
ADD COMMENT
0
Entering edit mode

This doesn't quite work in regards to the actual alleles if some are phased and some aren't (because they are rare).

This worked:

bcftools query -f '%ID %CHROM %POS[ %TGT]\n' ../131.vcf | tr "|" " " | tr "/" " " | less -S

ADD REPLY
0
Entering edit mode

Then replace the tr command at the end with

tr "/|" " "
ADD REPLY
0
Entering edit mode
8.8 years ago
Vivek ★ 2.7k
awk '! /\#/' file.vcf | awk -F '\t' '{print $3"\t"$1"\t"$2"\t"$4"\t"$5}'
ADD COMMENT
0
Entering edit mode

No, This doesn't handle more than biallelic SNPs.

ADD REPLY
0
Entering edit mode
8.8 years ago

GATK VariantsToTable https://www.broadinstitute.org/gatk/blog?id=7089

or my tool BioAlcidae see Taking genotypes out of a vcf file

ADD COMMENT

Login before adding your answer.

Traffic: 1204 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6