Question

How to I convert 23andMe Raw Genome to GenBank or FASTA?

8

Entering edit mode

11.0 years ago

someashole ▴ 90

I used 23andMe to download my raw genome. I have it in a .txt file but you can't use the format for real bio programs. I want to make my own library for further analysis. Does anyone know how I can convert .TXT to FASTA, GenBank, or any other usable file type?

fasta genbank 23andme • 19k views

ADD COMMENT • link updated 3.6 years ago by Ram 45k • written 11.0 years ago by someashole ▴ 90

0

Entering edit mode

Can you provide an example of what your data looks like? Then if you could provide an example of what you want the output to look like that would also help.

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 11.0 years ago by Jason ▴ 940

1

Entering edit mode

snp/rs id     chrm #  position    genotype
rs4477212    1         82154      AA
rs3094315    1         752566    AG
rs3131972    1         752721    AG
rs12124819  1         776546    AA

there are about 960k lines

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 11.0 years ago by someashole ▴ 90

0

Entering edit mode

I'm not sure how the data should look for a usable format

ADD REPLY • link 11.0 years ago by someashole ▴ 90

2

Entering edit mode

The output that you get above is already the most compact form that you can get your data in. It represents the differences relative to the reference genome.

You could for example transform this to two diploid genomes in FASTA format but do you realize that your files would then be gigantic ones of many gigabytes and these files would not show you what the the changes were.

The right way to go about this is to formalize what do you want to do next with your data. Then depending on that aim people here can advise what to transform it to.

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 11.0 years ago by Istvan Albert 102k

1

Entering edit mode

Could you convert this to FASTA though? Are the genotype alleles listed so that one sister chromosome is always first and the other is always second? Or is it random? If it's random, there's no way to construct FASTA because we don't know if, for example at 752566 and 752721 we have A-A and G-G or A-G and G-A.

If I was going to do anything with it, I think I'd want a VCF file.

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 11.0 years ago by Emily 24k

Ram · Answer 1 · 2014-06-01

8

Entering edit mode

11.0 years ago

chrchang523 11k

The commenters are correct re: first figuring out something you want to do with the data, and checking what input formats work with that.

With that said, here are two conversions that might come in handy.

23andMe to VCF: this is now supported by PLINK.
```
plink --23file [name of your file] --snps-only no-DI --recode vcf
```
What's --snps-only no-DI, you ask? Well, 23andMe files contain mostly SNP calls, but there are a few indel calls as well. Unfortunately, the actual bases involved in the indels are NOT saved; instead, there's just 'D' for deletion and 'I' for insertion, and you'd need an indel database to determine a valid VCF representation of the call. So we just punt here and filter out all markers with 'D' or 'I' allele codes.
VCF to FASTA:

This can be done with a combination of VCFTools and a Perl script. See here for details.

ADD COMMENT • link updated 3.6 years ago by Ram 45k • written 11.0 years ago by chrchang523 11k

0

Entering edit mode

Once you have the VCF, check the answers in this discussion: New Fasta Sequence From Reference Fasta And Variant Calls File?

ADD REPLY • link 11.0 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

And my blog post on conversion to VCF for use with the Ensembl Variant Effect Predictor.

ADD REPLY • link 11.0 years ago by Neilfws 49k

score 0 · Answer 2 · 2016-03-25

0

Entering edit mode

9.1 years ago

jeffreyice1105 • 0

This issue I have with the 23file is that I get invalid chromosome code 85280 on line 585542 of .bim file.

(Use --allow-extra-chr to force it to be accepted.)

But when I add that it tells me you cant use --allow-extra-chr as it cannot currently be with --23file.

Anybody now how to fix this?

Jeff

ADD COMMENT • link 9.1 years ago by jeffreyice1105 • 0

score 0 · Answer 3 · 2017-03-13

0

Entering edit mode

8.2 years ago

missstrawchewwer • 0

Jeff, try this:

plink --23file 23andmefile.txt Surname Firstname Sex --snps-only no-DI --make-bed --out plink_genome

ADD COMMENT • link 8.2 years ago by missstrawchewwer • 0