Hello,
I have merged two datasets, using plink. One of the datasets have much less SNPs (dataset1; 470000 SNPs per choromosome, and dataset2; 170000 SNPs per chromosome), so I analyzed, using the common SNPs in both datasets.
Compared to the results before merging two datasets, the results were different. Some SNPs that were statistically significant became not statistically significant anymore, and some SNPs became statistically significant. I think it is because we added some patients when merging, and also I anlayzed only with common SNPs in both datasets.
Then could I just say that the results after merging are fine? Or shoud I check whether I am not missing some important SNPs which became not statistically significant after merging though that were statistically significant before merging. If so, what should I need to do?
Thank you in advance!
You haven't told us anything about the number of individuals, the purpose of your analysis, the statistical test you performed and the technology used to obtain the data, which are potentially important parameters. So please elaborate.
Thank you for the reply.
The number of individuals,, Dataset 1 -- 2000 (case and control), Dataset 2 -- 5 (case only)
The purpose of your analysis Case control analysis
The statistical test you performed Fisher
The technology used to obtain the data Plink
Thank you!
Well, then you are removing an awful lot of variants to just add 5 more individuals. I would probably just test in Dataset 1, and then see if those variants were called in Dataset 2.
Thank you - I will compare the two results! But actually I need to add the five cases...
If we have to add those 5 cases, then run your tests with PCs for the merged data.
Thank you so much. You mean to test with population controls?
I meant principal components, here is a good info: In genome wide association studies, what are principal components?.
Thank you so much for your reply. Actually all cases and controls in the datasets have been confirmed that they are all in the same population. So if the 300,000 SNPs are reported to be very rare in the population, maybe we do not need to care, but actually it includes not only rare variants.