Question

Formatting genotype matrix for imputation

1

Entering edit mode

9.3 years ago

Alexander Skates ▴ 370

I have genotype data in the form of a number of birdseed files (one for each sample), of the form:

Composite Element REF    Call    Confidence
SNP_A-2131660    2    0.0060
SNP_A-1967418    2    0.0281
SNP_A-1969580    2    0.0074
SNP_A-4263484    2    0.0104
SNP_A-1978185    0    0.0034

I've loaded these into a matrix in R to perform QC upon them and annotate them with the rs ids instead of the probe id, resulting in something like this:

> geno[1:4,1:4]
           TCGA.A7.A0D9 TCGA.A7.A0DB TCGA.A7.A13G TCGA.AC.A2FB
rs2887286             2            2            2            2
rs1496555             2            2            2            1
rs3890745             2            2            1            2
rs10489588            1            0            0            0
>
> dim(geno)
[1] 693085     94

I want to use 1000 Genomes data to impute genome wide SNP data, however I'm not sure how to get from this format to the format required by IMPUTE2. I've been reading through this wiki and it seems like it might be easiest to convert this to PED format and then use GTOOL to convert to IMPUTE format.

Are there any existing tools that can do this reformatting directly? Or will I have to write a script to do the matrix > PED reformatting and then use GTOOL for PED > IMPUTE?

genotype-matrix IMPUTE2 PLINK • 3.6k views

ADD COMMENT • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by Alexander Skates ▴ 370

Ram · Accepted Answer · 2015-07-30

3

Entering edit mode

9.3 years ago

Endre Bakken Stovner ▴ 970

If you use shapeit2 to prephase the chromosomes (best practice) it will accept bed files. Shapeit2 outputs files that can be read by Impute 2. And shapeit2 can read bed files directly, with the --input-bed flag.

The filetype impute2 uses is called Oxford, and with plink1.9 you can convert most filetypes to Oxford format. I've never heard of birdseed format, but here is a script that can convert birdseed to bed: https://www.broadinstitute.org/mpg/birdsuite/howto.html

ADD COMMENT • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by Endre Bakken Stovner ▴ 970

1

Entering edit mode

Sorry for the slightly incoherent reply, wrote it on my teensy screen iPhone.

ADD REPLY • link 9.3 years ago by Endre Bakken Stovner ▴ 970

0

Entering edit mode

It's fine! I had seen the tool but I was hoping to find an easier way... In the end I wrote a script to convert all the files using the tools in question, and then merged them all together using Plink.

ADD REPLY • link 9.3 years ago by Alexander Skates ▴ 370

1

Entering edit mode

Hi do you mind sharing your script or give me an outline of what you did? I am stuck in a similar situation for TCGA data!

ADD REPLY • link 8.6 years ago by halffedelf ▴ 40