Using Snps To Tag Deletions In 1000 Genomes
1
2
Entering edit mode
11.5 years ago
Ryan D ★ 3.4k

I'm sure I knew how to do this at one time.

I have a CNV / CNP / CNA / SV or whatever people call polymorphic structural variants nowadays. It is called esv2666691 and it is in the PLINK bed bim and fam files located here: https://www.dropbox.com/sh/xk2rfeixwqityol/LeGEoFSGX1

There is one SNP in high LD with this deletion: rs12628403 . Unfortunately this SNP is not genotyped or imputed on any of the four platforms we've used. So my question is this: given a list of SNPs which have successfully been imputed: what is the best way to find a SNP haplotype containing the deletions across populations? The one SNP has an R-squared of 0.90 in Asians to 0.94 in Europeans. I suspect or hope that some SNP haplotype that does not necessarily include that SNP might refine it futher.

Any help?

Thanks, Rx

1000genomes cnv • 3.7k views
ADD COMMENT
1
Entering edit mode

The folder '/Public/1kg' does not exist.

ADD REPLY
0
Entering edit mode

Thanks, the problem should be fixed now.

ADD REPLY
2
Entering edit mode
11.5 years ago
Tky ★ 1.0k

Hi, Ryan. It is an interesting question. Since HapMap 3 has the CNV data, see http://hapmap.ncbi.nlm.nih.gov/downloads/cnv_data/?N=D

I think you can incorporate this dataset into the SNP genotyping data, and you can locate the proxy SNP for your target CNV ( Well you need to check whether your CNV is included in HapMap data)

ADD COMMENT
0
Entering edit mode

Thanks for the suggestion, Tky. It seems that the variant is not in HapMap's calls. We know this SNP is in LD, but it cannot be imputed. So we wonder if a longer haplotype of SNPs that have been genotyped/imputed may be able to call this CNV.

ADD REPLY
0
Entering edit mode

Hi, Ryan. I am wondering how you code the SV in plink ped format? ( I think the CNV could be represented as 1/1 for 2 copy; 1/2 for 3 copy; 0/1 for 1 copy deletion and so on)

However If you process the data using VCFtools and output to plink format, how will be 3 copy duplication looks like in the ped file? Is it possible to calculate the LD between this SV and SNP nearby?

ADD REPLY
0
Entering edit mode

I am honestly not sure how more complex CNVs are coded in PLINK. The CNVs I've worked with have all tended to be deletions so there is usually just coding like A DEL or G ACTGTGTGTATATATATAGTTT (a long string of nucleotides--sometimes 10s of thousands of characters) for the two alleles. It would be better to try to do this with one from 1kG that you know about to see how it is represented. Good question.

ADD REPLY
1
Entering edit mode

I will try to make some fake data in this format and see whether the CNV can be taged or not, Plink has a CNV format which is very different from the PED format.

ADD REPLY
0
Entering edit mode

Thanks, Tky. I'm also happy to give you the code to pull the real raw data above from 1000 Genomes VCFs.

ADD REPLY
0
Entering edit mode

I tried to make some artificial data in plink ped format, and the deletion can be well tagged by neighboring SNP. It seems not to be a format problem.

ADD REPLY

Login before adding your answer.

Traffic: 2662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6