Question

Mysterious horizontal lines on Manhattan plot

2

Entering edit mode

6.7 years ago

rednalf ▴ 90

I obtained the following Manhattan plot:

What could explain the visible horizontal lines of SNPs (indicated by red arrows)? The study was performed on ~ 500 individuals.

SNP genome • 4.4k views

ADD COMMENT • link 6.7 years ago by rednalf ▴ 90

0

Entering edit mode

How to add images to a Biostars post

ADD REPLY • link 6.7 years ago by WouterDeCoster 47k

0

Entering edit mode

Thank you! I have modified my post.

ADD REPLY • link 6.7 years ago by rednalf ▴ 90

score 0 · Answer 1 · 2018-04-19

0

Entering edit mode

6.7 years ago

Fabio Marroni ★ 3.0k

What I would do is to extract all the p-values between 1^-12 and 12^-13 and see what's happening. The same with those around 10^-4. There are a lot of possible reasons. Pattern of genotypic data, number of iterations (if the p-value is computed via permutations), rules for stopping permutations, and so on.

ADD COMMENT • link 6.7 years ago by Fabio Marroni ★ 3.0k

1

Entering edit mode

Thank you @FabioMarroni for your answer. When I extract those p-values, I see that they all have the same frequency and standard error. Could it be that this happened because I did not filter for minor allele frequency and thus that the phenotype is present in one individual?

ADD REPLY • link 6.7 years ago by rednalf ▴ 90

0

Entering edit mode

Yes, that's possible. The most important question regards those with p around 10^-12. Those look like the most significant of the whole experiment, but it is likely that they are false positives. But maybe some of them are real. You should try to see if you find some pattern that tells you if they are real or not. Do they all have the same allele frequency? Are they singletons from an outlier? Questions like this. You can then apply the same reasoning to the lower line, but I would not be worried about that too much.

ADD REPLY • link 6.7 years ago by Fabio Marroni ★ 3.0k

0

Entering edit mode

Perfect, I will do that! Thank you so much for your nice help!

ADD REPLY • link 6.7 years ago by rednalf ▴ 90

0

Entering edit mode

Hi svalf, that looks very interesting. Can you please give more information on the phenotype you're looking at? Is it logistic regression or linear regression? And, are these ~500 samples all unrelated?

ADD REPLY • link 6.7 years ago by maegsul ▴ 170

0

Entering edit mode

Hello. The phenotype is a measure of time (in ms) and yes I have some related individuals in the cohort (around half of them), with 2-3 individuals per family. I performed a linear mixed model using the leaving-one-chromosome-out approach.

ADD REPLY • link 6.7 years ago by rednalf ▴ 90

3

Entering edit mode

I am not experienced with that approach (are you using GCTA?), but I suspect that you might get these very similar P-values all over the genome when you have a relatively low number of samples (and even, due to the related individuals).

It might be also that individuals from the same family share a variant specific to only that family, and you are regressing for your phenotype only within these ~3 individuals. MAF filter could help here maybe.

I am not sure exactly for your data, but in principle for genotype - quantitative phenotype relationships you could try FastQTL too I think: http://fastqtl.sourceforge.net/ My experience with eQTL variants were somewhat similar though - there were many similar P-values coming out of the linear regression.

In addition, I haven't seen many papers using Manhattan plots to display results from linear regression? Also see here, it seems that someone doing linear regression had a very similar problem: Problem with Manhattan plot

ADD REPLY • link 6.7 years ago by maegsul ▴ 170