I obtained the following Manhattan plot:
What could explain the visible horizontal lines of SNPs (indicated by red arrows)? The study was performed on ~ 500 individuals.
I obtained the following Manhattan plot:
What could explain the visible horizontal lines of SNPs (indicated by red arrows)? The study was performed on ~ 500 individuals.
What I would do is to extract all the p-values between 1^-12 and 12^-13 and see what's happening. The same with those around 10^-4. There are a lot of possible reasons. Pattern of genotypic data, number of iterations (if the p-value is computed via permutations), rules for stopping permutations, and so on.
Thank you @FabioMarroni for your answer. When I extract those p-values, I see that they all have the same frequency and standard error. Could it be that this happened because I did not filter for minor allele frequency and thus that the phenotype is present in one individual?
Yes, that's possible. The most important question regards those with p around 10^-12. Those look like the most significant of the whole experiment, but it is likely that they are false positives. But maybe some of them are real. You should try to see if you find some pattern that tells you if they are real or not. Do they all have the same allele frequency? Are they singletons from an outlier? Questions like this. You can then apply the same reasoning to the lower line, but I would not be worried about that too much.
I am not experienced with that approach (are you using GCTA?), but I suspect that you might get these very similar P-values all over the genome when you have a relatively low number of samples (and even, due to the related individuals).
It might be also that individuals from the same family share a variant specific to only that family, and you are regressing for your phenotype only within these ~3 individuals. MAF filter could help here maybe.
I am not sure exactly for your data, but in principle for genotype - quantitative phenotype relationships you could try FastQTL too I think: http://fastqtl.sourceforge.net/ My experience with eQTL variants were somewhat similar though - there were many similar P-values coming out of the linear regression.
In addition, I haven't seen many papers using Manhattan plots to display results from linear regression? Also see here, it seems that someone doing linear regression had a very similar problem: Problem with Manhattan plot
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
How to add images to a Biostars post
Thank you! I have modified my post.