How do you convert a text file from Ancestry.com to vcf format?
I understand that I could convert from 23andMe to vcf with something like:
bcftools convert -c ID,CHROM,POS,AA -s SampleFile -f reference/Homo_sapiens.GRCh37.dna.primary_assembly.fa --tsv2vcf Data/SampleFile/AncestryDNA.txt -Oz -o Data/SampleFile.vcf.gz
However, Ancestry.com's files are slightly different from 23andMe files. Ancestry.com's files presents as five TAB delimited columns instead of four like 23andMe.
rsid chromosome position allele1 allele2
rs3131972 1 752721 A G
rs114525117 1 759036 G G
rs12124819 1 776546 A A
l also tried a direct conversion but have something wrong because it's not working:
cat SampleFile.zip|grep -v '#'|grep -v 'rsid'|awk -F'\t' '{ print $1"\t"$2"\t"$3"\t"$4$5; }'|sed s/\\t23\\t/\\tX\\t\/g |sed s/\\t24\\t/\\tY\\t\/g| grep -P -v '\t25\t' >> SampleFile.txt
With Ancestry.com, a generic text file name is within the zip file such that I would need to use the basename that I saved it as for the converted file name. For example:
SampleFile1.zip/AncestryDNA.txt > SampleFile1.txt
SampleFile2.zip/AncestryDNA.txt > SampleFile2.txt
I'm using these files for Beagle 5.1 which has an exception to the vcf format for male chromsomes:
Beagle uses Variant Call Format (VCF) 4.3 for input and output genotype data, except that Beagle requires male non-pseudoautosomal X-chromosome genotypes to be coded as homozygous diploid genotypes.
I'm using Ubuntu 18.04.3 LTS.
Hi! thank you for the nice and clear explanation! I have tried to reproduce your example, using the same ref genome. However, LOTS of ALT that are '.', and they do not agree with the example you have produced. For instance, you have:
and I get:
Do you have any clue what I missing?
Thanks a lot! Mariana
Hi, Mariana. I'm not sure.