Ancestry estimation using 1000 genomes as reference
2
0
Entering edit mode
12 weeks ago
kasgel • 0

Hi,

I'm trying to perform ancestry estimation for a study sample of size one using 1000 genomes (g1k) as reference. That is, ancestry estimation for one individual. I have performed variant calling against GRCh38.

However, all guides I can find for doing this using PLINK & g1k assume that the study data contains more than one individual. Example of such guide is here. For example,it's not possible to prune for variants in high LD if the data contains only one individual (too few founders).

I'm not sure how to approach this, should I be merging the g1k data with the study sample?

Any guidance would be appreciated! thanks :)

ancestry plink 1000genomes • 458 views
ADD COMMENT
1
Entering edit mode

i can't imagine the LD calculations derived from the query data would be helpful until you had several hundred individuals. most people use LD measurements from reference data, which is what --indep-pairwise will do anyway.

ADD REPLY
2
Entering edit mode
11 weeks ago
DBScan ▴ 450

If you only have a single sample, I would suggest using somalier.

ADD COMMENT
0
Entering edit mode
11 weeks ago
  • The guide explicitly tells you to merge your study sample with the g1k data in the middle.
  • You correctly suspect that the guide's LD pruning recommendation is not applicable to your use case. It is reasonable to LD-prune the merged dataset (or just the g1k dataset) instead.
ADD COMMENT

Login before adding your answer.

Traffic: 1737 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6