Question

Snps In Linkage Disequilibrium For Ncbi Build 37 (Hg19)

2

Entering edit mode

11.5 years ago

cruzpedro ▴ 100

Hi all

I'd like to know if anyone has or might know where to get a list of SNPs in high LD for the last human genome build (GRCh37). I'm currently working with the Brazilian population. I'm trying to do PCA for population structure analysis and, therefore, need to remove high LD regions from my dataset to get a low biased PC axes.

Thanks a lot in advance for the attention!

Best regards, Pedro. PS: Just --indep-pairwise option in PLINK won't solve my problem

ld snp gwas plink • 5.8k views

ADD COMMENT • link updated 10.1 years ago by Biostar 20 • written 11.5 years ago by cruzpedro ▴ 100

2

Entering edit mode

11.5 years ago

Tky ★ 1.0k

Well I guess what is true for other populations might not be same for Brazilian population, therefore you may need to calculate the pairwise LD in you sample set and construct LD blocks ( and scan the consecutive window to locate the high LD region) Please refer to this paper for more information: A Price et a;., Long-Range LD Can Confound Genome Scansin Admixed Populations, AJHG 2008 (https://www.sciencedirect.com/science/article/pii/S0002929708003534)

ADD COMMENT • link 11.5 years ago by Tky ★ 1.0k

0

Entering edit mode

Hey! Thanks a lot for your answer! I was planning to do this as well later, since I was first geting a "test" set for running EIGENSTRAT. Well, I think I'll use for my sample the "--ld-window-kb" option in PLINK: http://pngu.mgh.harvard.edu/~purcell/plink/ld.shtml#ld2. Do you think this might suffice for LD controlling in a 60 individuals dataset? I have some data on other 50 individuals I can include to infer patterns of LD in my population.

Best regards!

ADD REPLY • link 11.5 years ago by cruzpedro ▴ 100

0

Entering edit mode

Yeah, you should check the eigenstrat plot first (against other HapMap populations) And more samples give more accurate LD estimations.

ADD REPLY • link 11.5 years ago by Tky ★ 1.0k

Ram · Accepted Answer · 2013-05-27

2

Entering edit mode

11.5 years ago

Pierre Lindenbaum 164k

The LD data for hapmap / hg18 are available here: http://hapmap.ncbi.nlm.nih.gov/downloads/ld_data/?N=D

you could run liftOver to map those positions to hg19.

ADD COMMENT • link 11.5 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Thanks a lot for you answer, really shed some light to my problem!

ADD REPLY • link 11.5 years ago by cruzpedro ▴ 100

0

Entering edit mode

Do you know if there is the same data from the newest release of the 1000G project?

ADD REPLY • link 10.1 years ago by Jimbou ▴ 960

0

Entering edit mode

LD data for phase 3 on GRCh37 is available in the 1000 genomes browser, empowered by Ensembl. More details on that view can be found here.

ADD REPLY • link 10.1 years ago by Denise CS ★ 5.2k

0

Entering edit mode

Thank you very much. But is there also a ftp or whole genome download site?

ADD REPLY • link 10.1 years ago by Jimbou ▴ 960

0

Entering edit mode

Had a chat with my colleague Laura Clarke from the 1000 Genomes here at EMBL-EBI and this is what she said

There are no bulk downloads for LD information. Doing pairwise comparisons of 80m sites between 2500 individuals is impossible. The thing to recommend is to convert the files to plink format using our VCF to PED tool or vcftools

http://www.1000genomes.org/faq/can-i-convert-vcf-files-plinkped-format

http://vcftools.sourceforge.net/documentation.html#plink

Then look at the region of interest in haploview or equivalent. There is no feasible way to get this in bulk for any large quantity of 1000 genomes sites or individuals

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 10.1 years ago by Denise CS ★ 5.2k