bcf convert 23andme to vcf
2
0
Entering edit mode
5.7 years ago
stuartkkim • 0

Hi,

I need to convert a 23andme file to vcf using bcf. The command is:

bcftools convert --tsv2vcf input.tab.gz -f ref.fa -s SampleName -Ob -o sample.bcf

I have a 23andme.txt file.

What do I use for "input.tab.gz"; can I use the 23andme.txt file or do I need to convert it first?

What do I use for "ref.fa"? Where can I get a ref.fa file for build 37?

Is "SampleName" just the name of the individual in the 23andme file?

I used plink to input the 23andme file and --recode vcf. The problem is that there is no ALT allele if the genotype is homozygous. Is there a way to insert the ALT allele? If not, then the plink solution does not help.

Thanks

SNP bcftools • 6.0k views
ADD COMMENT
2
Entering edit mode

Note that ALL of the solutions here have the same limitation as plink. It’s impossible to report an ALT allele if it simply isn’t in the data; you need an additional SNP database file. It’s only REF alleles that can be reliably filled in without that (with plink, you’d use plink2’s —ref-from-fa flag).

ADD REPLY
0
Entering edit mode

I thought the reply below does not have this problem, but it does. Where can we find the ALT alleles for 23andMe data?

I guess it doesn't really matter, as you only DON'T know the ALT allele when the genotype is REF/REF in the first place.

If you have several samples, you can guess the ref more reliably.

ADD REPLY
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLY
0
Entering edit mode

New issue arose:

bcftools convert --tsv2vcf input.tab.gz -f ref.fa -s SampleName -Ob -o sample.bcf

input.tab.gz is a 23andme.txt file that is version 2 and build 36.

ref.fa is Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz.

So the ref fa is build 37 and the 23andme file is build 36. Where can I get a ref fa file for build 36? I can not find one archived at ENSEMBL.

thanks. Stuart

ADD REPLY
1
Entering edit mode

You can find hg16 human genome build here: https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.10/

ADD REPLY
3
Entering edit mode
5.7 years ago
Emily 24k

input.tab.gz is your input from 23AndMe. It's expecting a zipped file, so you may wish to zip it. Just check that your file is in the format on the bcftools page, eg:
rs6139074 20 63244 AA rs1418258 20 63799 CC rs6086616 20 68749 TT rs6039403 20 69094 AG

You can get a reference FASTA for GRCh37 from Ensembl.

The Sample Name is whatever you want to call it. That's what's going to appear in the genotype header in the VCF, so it's up to you.

ADD COMMENT
0
Entering edit mode

Thanks Emily!!! That worked.

ADD REPLY
0
Entering edit mode

I was getting a result like this:

Rows total:     963047
Rows skipped:   963047
Missing GTs:    0
Hom RR:     0
Het RA:     0
Hom AA:     0
Het AA:     0

because I was using the fasta file from NCBI (GCF_000001405.25_GRCh37.p13_genomic.fna.gz). The chromosome names in the fasta file don't match those in the 23andMe file, hence every row is skipped. Using the file from Ensembl gave the expected result:

Rows total:     963047
Rows skipped:   1037
Missing GTs:    3937
Hom RR:     495570
Het RA:     278121
Hom AA:     184382
Het AA:     0

If you don't know, here is how to get a VCF from a BCF (binary VCF):

bcftools view sample.bcf > sample.vcf

ADD REPLY
0
Entering edit mode
5.2 years ago
Gabriel R. ★ 2.9k

It is one line in glactools, the example in the test/ folder:

glactools 23andme2acf --epo epochr1.gz  --fai human_g1k_v37.fasta.fai smallPublic23andMeData.gz anon  |  glactools glac2vcf -

Probably you should replace epochr1.gz for all.epo.gz in real data.

ADD COMMENT

Login before adding your answer.

Traffic: 2258 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6