How to I convert 23andMe Raw Genome to GenBank or FASTA?
3
8
Entering edit mode
10.6 years ago
someashole ▴ 90

I used 23andMe to download my raw genome. I have it in a .txt file but you can't use the format for real bio programs. I want to make my own library for further analysis. Does anyone know how I can convert .TXT to FASTA, GenBank, or any other usable file type?

fasta genbank 23andme • 19k views
ADD COMMENT
0
Entering edit mode

Can you provide an example of what your data looks like? Then if you could provide an example of what you want the output to look like that would also help.

ADD REPLY
1
Entering edit mode
snp/rs id     chrm #  position    genotype
rs4477212    1         82154      AA
rs3094315    1         752566    AG
rs3131972    1         752721    AG
rs12124819  1         776546    AA

there are about 960k lines

ADD REPLY
0
Entering edit mode

I'm not sure how the data should look for a usable format

ADD REPLY
2
Entering edit mode

The output that you get above is already the most compact form that you can get your data in. It represents the differences relative to the reference genome.

You could for example transform this to two diploid genomes in FASTA format but do you realize that your files would then be gigantic ones of many gigabytes and these files would not show you what the the changes were.

The right way to go about this is to formalize what do you want to do next with your data. Then depending on that aim people here can advise what to transform it to.

ADD REPLY
1
Entering edit mode

Could you convert this to FASTA though? Are the genotype alleles listed so that one sister chromosome is always first and the other is always second? Or is it random? If it's random, there's no way to construct FASTA because we don't know if, for example at 752566 and 752721 we have A-A and G-G or A-G and G-A.

If I was going to do anything with it, I think I'd want a VCF file.

ADD REPLY
8
Entering edit mode
10.6 years ago

The commenters are correct re: first figuring out something you want to do with the data, and checking what input formats work with that.

With that said, here are two conversions that might come in handy.

  1. 23andMe to VCF: this is now supported by PLINK.

    plink --23file [name of your file] --snps-only no-DI --recode vcf
    

    What's --snps-only no-DI, you ask? Well, 23andMe files contain mostly SNP calls, but there are a few indel calls as well. Unfortunately, the actual bases involved in the indels are NOT saved; instead, there's just 'D' for deletion and 'I' for insertion, and you'd need an indel database to determine a valid VCF representation of the call. So we just punt here and filter out all markers with 'D' or 'I' allele codes.

  2. VCF to FASTA:

    This can be done with a combination of VCFTools and a Perl script. See here for details.

ADD COMMENT
0
Entering edit mode

Once you have the VCF, check the answers in this discussion: New Fasta Sequence From Reference Fasta And Variant Calls File?

ADD REPLY
0
Entering edit mode

And my blog post on conversion to VCF for use with the Ensembl Variant Effect Predictor.

ADD REPLY
0
Entering edit mode
8.8 years ago

This issue I have with the 23file is that I get invalid chromosome code 85280 on line 585542 of .bim file.

(Use --allow-extra-chr to force it to be accepted.)

But when I add that it tells me you cant use --allow-extra-chr as it cannot currently be with --23file.

Anybody now how to fix this?

Jeff

ADD COMMENT
0
Entering edit mode
7.8 years ago

Jeff, try this:

plink --23file 23andmefile.txt Surname Firstname Sex --snps-only no-DI --make-bed --out plink_genome

ADD COMMENT

Login before adding your answer.

Traffic: 1934 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6