A Manhattan Plot is created in GWAS studies to visualize where SNP positions and there logarithmic p-values. Can somebody please with the help of a simple numerical example for 2 or 3 chromosome show how this plot is made?
Update:
Actually I am confused how data is processed so that each SNP has different p-values . Take for the sake of argument that I have following scenario
Chromosomes=23
Controls=500
Subjects=800
SNP=20000/SNP that is each chromosome has 20000 SNPs
Now my confusion is that how those dots are made . I mean how is it possible that on horizontal axis that is representing a particular SNP on a specific chromosome how could we have multiple dots going up or down? that is my confusion?
regards
If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Have a look here
Please see my updated question. Thanks for reply
See R qqman package.
Please see my updated question thanks for reply.
Every association test is performed per SNP, there is nothing done 'per chromosome'.
Can you explain how those different dots per SNP are calculated? Actually I have hard times understanding dataset. Could you refer me some links where I could see these types of dataset with detailed descriptions?
For each SNP, you can do a linear regression on your phenotype.
If a SNP is associated with a phenotype, your regression line will have a slope equal of your Beta term in your Y=XB+e model. This Beta is the effect of the SNP. If the SNP doesn't have any effect on the phenotype, your B should be roughly equal to 0. With this Beta term (also called effect size of the SNP) and the associated standard error, you can calculate a p-value that will be "linked" to your beta term. Another way to see it would be to do a simple t-test to compare 2 means (means of the SNPs in case vs mean of the SNPs in controls for example). By doing this simple statistical test, you'll get a p-value as well. The lower the p-value, the more the SNP is likely to be associated with a variation in your phenotype.
The manhattan plot is just a way to plot of all these p-value, with a -log() transformation for clarity sakes.
In addition, take a look at my calculations here, which will give you an idea of how to perform your own simple association test: A: SNP dataset and Z Score
My confusion is that what those dots are representing? According to me there should be one to many relationship between one SNP and many P values if that is the case then how data is organized.
Please if you could share the actual sample data like all columns but only few rows that one uses for manhattan plot then things would be pretty easier for me.
regards
Those dots represent each SNP in a chromosome plotted against its pvalue. In each chromosome, there can be several SNPs, hence you have several dots per chromosome. For example below, chromosome 1 has SNPs rs1 to rs6 , each having it's own Pvalue or association with the phenotype, chromosome 22 has another 6 SNPs with its associated pvalue. So when the manhatan plot is generated, you will have 6 dots for chromosome 1 and another 6 for chromosome 22
If you read this article here GettingGeneticsDone, it explains in very simple terms with an example data set of how the plots can be created easily in a step wise manner.
Thanks for reply. if those dots represent multiple SNPs for one chromosome then what is the width of one chromosome as each chromosome on horizontal axis has a specific width?
The width for each chromosome is not fixed. It is dependant on the number of SNPs present per chromosome. For example, in Study A chromosome 1 has 1000 SNPs and chromosome 22 has 100 SNPs, the width of chr 1 will be larger than chr 22. In Study B, chromsome 22 has 500 SNPs but chromsome 1 has 250 SNPs, then the width of chr22 will be larger than chr1
Yaa that is my confusion so width represents number of SNPs and that is not constant and the height represent the log of p values (negative log) that is -log(p) . So take Chromosome 1 and SNP 1 then how can one have multiple dots in that particular horizontal position?
There is one to many relationship between chromosome and SNP and that is we know A chromosome has multiple SNPs. but how can there be one to many relationship between SNP and p-values that is how a particular SNP has multiple p-values( that is different dots)?
One never has multiple p-values per position, it just looks that way because you're cramming a lot of dots into a small image.
A single SNP can only have one p-value since you're only testing its association with a single phenotype.
then how there are multiple dots corresponding to one SNP on horizontal axis?
Multiple dots are multiple SNPs - they do not correspond to a single SNP. Can you upload an image of what plots you are looking at ? That can help us explain things to you a bit better.
See How to add images to a Biostars post I've done it for you this time.
It means I have not got the correct understanding of SNP data here. Can you show me how for example two dots per SNP are calculated?