Entering edit mode
10 months ago
Emily
•
0
Hello,
I am trying to figure out why my manhattan plot has many horizontal lines. My data includes 36 individuals from 3 populations with a total of ~80,000 SNPs. To create the plot I used these commands in Rstudio:
library(pcadapt)
vcf.path="merge_filtered chrom.vcf"
meta.path="meta4.csv"
genos <- read.pcadapt(vcf.path,type=c("vcf"))
x <- pcadapt(input=genos,K=2)
plot(x,option="manhattan", plt.pkg = "ggplot", snp.info = TRUE)
I also made another manhattan plot using qqman and the same horizontal banding still occurred.
The only explanation I can really think of that may be causing this is too much genetic similarity among the individuals.
I am new to doing Manhattan plots too so there should be someone more capable of answering this but the x-axis typically represents the genomic position (chromosome 1-22).
Moving this to a comment since it does not answer the question asked in the original post, which is about "banding" seen along X-axis.
Something seems a little odd here. What is the distribution of your p-values? -log10(0.05) is 1.3 and -log10(0.001) is 3. From the picture it looks like the majority of your data is significant. Though it's impossible to tell with this scale.
I also don't understand what your x-axis is meant to represent. Did you number your SNPs with MAF>0.05 from 1-~80,000? How are they ordered? I agree with Shane, despite being off topic, manhattan plots typically have genomic position on the x-axis.
My guess is the banding has something to do with rounding at such small numbers, or how the p-values are calculated (i.e., a p-value of 10^-60 is possible to calculate given the number of data points, whereas the p-value of 10^-61 is not given the data). I've seen similar structure in lots of genomic data in manhattan style plots.
Those horizontal lines basically mean that you only have a few values of significant p-value. If you have small sample size, binary outcome, you can easily imagine that will be the case if you have any statistics background.
There might be other issues in your data, like inflation.