Question

Manhattan Plot

0

Entering edit mode

16 months ago

Emily • 0

Hello,

I am trying to figure out why my manhattan plot has many horizontal lines. My data includes 36 individuals from 3 populations with a total of ~80,000 SNPs. To create the plot I used these commands in Rstudio:

library(pcadapt)
vcf.path="merge_filtered chrom.vcf"
meta.path="meta4.csv"
genos <- read.pcadapt(vcf.path,type=c("vcf"))
x <- pcadapt(input=genos,K=2)
plot(x,option="manhattan", plt.pkg = "ggplot", snp.info = TRUE)

I also made another manhattan plot using qqman and the same horizontal banding still occurred.

The only explanation I can really think of that may be causing this is too much genetic similarity among the individuals.

I've seen other plots that have a somewhat similar pattern, but not as severe as mine.

qqman pcadapt manhattan GWAS • 1.5k views

ADD COMMENT • link updated 16 months ago by dthorbur ★ 3.0k • written 16 months ago by Emily • 0

1

Entering edit mode

I am new to doing Manhattan plots too so there should be someone more capable of answering this but the x-axis typically represents the genomic position (chromosome 1-22).

ADD REPLY • link 16 months ago by Shane ▴ 20

0

Entering edit mode

Moving this to a comment since it does not answer the question asked in the original post, which is about "banding" seen along X-axis.

ADD REPLY • link 16 months ago by GenoMax 151k

1

Entering edit mode

Something seems a little odd here. What is the distribution of your p-values? -log10(0.05) is 1.3 and -log10(0.001) is 3. From the picture it looks like the majority of your data is significant. Though it's impossible to tell with this scale.

I also don't understand what your x-axis is meant to represent. Did you number your SNPs with MAF>0.05 from 1-~80,000? How are they ordered? I agree with Shane, despite being off topic, manhattan plots typically have genomic position on the x-axis.

My guess is the banding has something to do with rounding at such small numbers, or how the p-values are calculated (i.e., a p-value of 10^-60 is possible to calculate given the number of data points, whereas the p-value of 10^-61 is not given the data). I've seen similar structure in lots of genomic data in manhattan style plots.

ADD REPLY • link 16 months ago by dthorbur ★ 3.0k

0

Entering edit mode

Those horizontal lines basically mean that you only have a few values of significant p-value. If you have small sample size, binary outcome, you can easily imagine that will be the case if you have any statistics background.

There might be other issues in your data, like inflation.

ADD REPLY • link 16 months ago by Zhenyu Zhang ★ 1.3k