Parallel GWAS?
0
0
Entering edit mode
17 months ago

Hello,

first off I need to say that it's completely bonkers that I can't find this answer by Googling. Perhaps the question is too dumb?

Question: Can I do GWAS per-chromosome and get the same coefficients/ORs as if I had run the entire genome at the same time?

I have millions of SNPs, meaning millions of variables, and if I do a glm() call in R with millions of variables, uhm, well... I'd really like to split the problem into 22 sanely-sized, autosomal, chunks. But I suspect I'll skew the numbers if I do.

Any help is greatly appreciated!

Joel

GWAS Parallel Chromosome • 1.0k views
ADD COMMENT
0
Entering edit mode

Yes, you can run GWAS per chromosome or per region chunk. GWAS is per SNP, so results would be independent for each SNP. I suggest to use dedicated software plink - https://www.cog-genomics.org/plink/ instead of R.

ADD REPLY
0
Entering edit mode

Huh... wow. But, you would need to calculate PCs for every "chunk", right?

ADD REPLY
0
Entering edit mode

To calculate PCs, remove correlated SNPs per chromosome in parallel, then you will have small enough data to get PC.

ADD REPLY
0
Entering edit mode

No. You would definitely want to calculate PC loadings on the entirety of the dataset.

In the event of genetic admixture, in particular recent genetic admixture, you may get unstable ancestry estimates if you were to calculate PC loadings on one chromosome at a time. this could have the effect of controlling the effects for some SNVs well, but others poorly, depending on specifics of local ancestry for that person.

in other words, more simply put, you will get the most stable estimates of each persons PC loadings by running the PCA on all chromosomes together.

ADD REPLY
0
Entering edit mode

Big thanks for replying!

Are those risks still actual if all included individuals are from the same population, and potential outliers have been excluded?

And, suppose you have 10,000 individuals and 5 million markers - if as you say, PCs must be calculated on everything at once, how on Earth do people handle this with single-core computation?

ADD REPLY
0
Entering edit mode

We need to prune the SNPs - remove correlated SNPs - this will reduce the size of the data. Read some tutorials about GWAS:

ADD REPLY

Login before adding your answer.

Traffic: 1526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6