Question

Manhattan Plot in GWAS (How P values are calculated using SNP dataset)

2

Entering edit mode

6.4 years ago

nkhan.mscs15seecs ▴ 80

A Manhattan Plot is created in GWAS studies to visualize where SNP positions and there logarithmic p-values. Can somebody please with the help of a simple numerical example for 2 or 3 chromosome show how this plot is made?

Update:

Actually I am confused how data is processed so that each SNP has different p-values . Take for the sake of argument that I have following scenario

Chromosomes=23
Controls=500
Subjects=800
SNP=20000/SNP  that is each chromosome has 20000 SNPs

Now my confusion is that how those dots are made . I mean how is it possible that on horizontal axis that is representing a particular SNP on a specific chromosome how could we have multiple dots going up or down? that is my confusion?

regards

GWAS plot • 16k views

ADD COMMENT • link updated 6.4 years ago by zx8754 12k • written 6.4 years ago by nkhan.mscs15seecs ▴ 80

1

Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLY • link 6.4 years ago by WouterDeCoster 47k

0

Entering edit mode

Have a look here

ADD REPLY • link 6.4 years ago by NB ▴ 960

0

Entering edit mode

Please see my updated question. Thanks for reply

ADD REPLY • link 6.4 years ago by nkhan.mscs15seecs ▴ 80

0

Entering edit mode

See R qqman package.

ADD REPLY • link 6.4 years ago by zx8754 12k

0

Entering edit mode

Please see my updated question thanks for reply.

ADD REPLY • link 6.4 years ago by nkhan.mscs15seecs ▴ 80

0

Entering edit mode

Every association test is performed per SNP, there is nothing done 'per chromosome'.

ADD REPLY • link 6.4 years ago by WouterDeCoster 47k

0

Entering edit mode

Can you explain how those different dots per SNP are calculated? Actually I have hard times understanding dataset. Could you refer me some links where I could see these types of dataset with detailed descriptions?

ADD REPLY • link 6.4 years ago by nkhan.mscs15seecs ▴ 80

4

Entering edit mode

For each SNP, you can do a linear regression on your phenotype.

If a SNP is associated with a phenotype, your regression line will have a slope equal of your Beta term in your Y=XB+e model. This Beta is the effect of the SNP. If the SNP doesn't have any effect on the phenotype, your B should be roughly equal to 0. With this Beta term (also called effect size of the SNP) and the associated standard error, you can calculate a p-value that will be "linked" to your beta term. Another way to see it would be to do a simple t-test to compare 2 means (means of the SNPs in case vs mean of the SNPs in controls for example). By doing this simple statistical test, you'll get a p-value as well. The lower the p-value, the more the SNP is likely to be associated with a variation in your phenotype.

The manhattan plot is just a way to plot of all these p-value, with a -log() transformation for clarity sakes.

ADD REPLY • link 6.4 years ago by wpierrick ▴ 90

1

Entering edit mode

In addition, take a look at my calculations here, which will give you an idea of how to perform your own simple association test: A: SNP dataset and Z Score

ADD REPLY • link 6.4 years ago by Kevin Blighe 88k

0

Entering edit mode

My confusion is that what those dots are representing? According to me there should be one to many relationship between one SNP and many P values if that is the case then how data is organized.

Chromosome    SNP
1              ?
2              ?
3              ?
4
.....
...
..
23

Please if you could share the actual sample data like all columns but only few rows that one uses for manhattan plot then things would be pretty easier for me.

regards

ADD REPLY • link 6.4 years ago by nkhan.mscs15seecs ▴ 80

2

Entering edit mode

Those dots represent each SNP in a chromosome plotted against its pvalue. In each chromosome, there can be several SNPs, hence you have several dots per chromosome. For example below, chromosome 1 has SNPs rs1 to rs6 , each having it's own Pvalue or association with the phenotype, chromosome 22 has another 6 SNPs with its associated pvalue. So when the manhatan plot is generated, you will have 6 dots for chromosome 1 and another 6 for chromosome 22

If you read this article here GettingGeneticsDone, it explains in very simple terms with an example data set of how the plots can be created easily in a step wise manner.

SNP CHR  Pos  P
rs1   1  1 0.9148
rs2   1  2 0.9371
rs3   1  3 0.2861
rs4   1  4 0.8304
rs5   1  5 0.6417
rs6   1  6 0.5191
rs16465  22 530 0.5644
rs16466  22 531 0.1383
rs16467  22 532 0.3937
rs16468  22 533 0.1779
rs16469  22 534 0.2393
rs16470  22 535 0.2630

ADD REPLY • link 6.4 years ago by NB ▴ 960

0

Entering edit mode

Thanks for reply. if those dots represent multiple SNPs for one chromosome then what is the width of one chromosome as each chromosome on horizontal axis has a specific width?

ADD REPLY • link 6.4 years ago by nkhan.mscs15seecs ▴ 80

0

Entering edit mode

The width for each chromosome is not fixed. It is dependant on the number of SNPs present per chromosome. For example, in Study A chromosome 1 has 1000 SNPs and chromosome 22 has 100 SNPs, the width of chr 1 will be larger than chr 22. In Study B, chromsome 22 has 500 SNPs but chromsome 1 has 250 SNPs, then the width of chr22 will be larger than chr1

ADD REPLY • link 6.4 years ago by NB ▴ 960

0

Entering edit mode

Yaa that is my confusion so width represents number of SNPs and that is not constant and the height represent the log of p values (negative log) that is -log(p) . So take Chromosome 1 and SNP 1 then how can one have multiple dots in that particular horizontal position?

There is one to many relationship between chromosome and SNP and that is we know A chromosome has multiple SNPs. but how can there be one to many relationship between SNP and p-values that is how a particular SNP has multiple p-values( that is different dots)?

ADD REPLY • link 6.4 years ago by nkhan.mscs15seecs ▴ 80

1

Entering edit mode

One never has multiple p-values per position, it just looks that way because you're cramming a lot of dots into a small image.

ADD REPLY • link 6.4 years ago by Devon Ryan 104k

1

Entering edit mode

A single SNP can only have one p-value since you're only testing its association with a single phenotype.

ADD REPLY • link updated 6.4 years ago by zx8754 12k • written 6.4 years ago by Devon Ryan 104k

0

Entering edit mode

then how there are multiple dots corresponding to one SNP on horizontal axis?

ADD REPLY • link 6.4 years ago by nkhan.mscs15seecs ▴ 80

1

Entering edit mode

Multiple dots are multiple SNPs - they do not correspond to a single SNP. Can you upload an image of what plots you are looking at ? That can help us explain things to you a bit better.

ADD REPLY • link 6.4 years ago by NB ▴ 960

0

Entering edit mode

enter image description here

ADD REPLY • link updated 6.4 years ago by WouterDeCoster 47k • written 6.4 years ago by nkhan.mscs15seecs ▴ 80

0

Entering edit mode

See How to add images to a Biostars post I've done it for you this time.

ADD REPLY • link 6.4 years ago by WouterDeCoster 47k

0

Entering edit mode

It means I have not got the correct understanding of SNP data here. Can you show me how for example two dots per SNP are calculated?

ADD REPLY • link 6.4 years ago by nkhan.mscs15seecs ▴ 80

score 3 · Accepted Answer · 2018-07-06

3

Entering edit mode

6.4 years ago

WouterDeCoster 47k

You seem have some issues reading what is said.

Each chromosome has multiple SNPs
For each SNP you perform one statistical test
Each SNP has one position and one p-value
Each SNP has only one dot in a Manhattan plot
It looks like they're lots of dots stacked on each other, but that's just because you are testing many many SNPs.

ADD COMMENT • link 6.4 years ago by WouterDeCoster 47k

0

Entering edit mode

So it means if we say that a particular set of dots are looking significant as they are above a certain threshold then it means we are talking about that multiple SNPs are getting significant not just one right?

ADD REPLY • link 6.4 years ago by nkhan.mscs15seecs ▴ 80

0

Entering edit mode

That's right. Note that due to linkage disequilibrium multiple SNPs may return a significant p-value, while they are actually on the same haplotype and there is only one or a few functional variants, which may or may not have been part of the SNP - chip.

ADD REPLY • link 6.4 years ago by WouterDeCoster 47k

0

Entering edit mode

Thanks for your patient and precise reply of my stupid questions as Now I have got this. I really like your kind of gentle people.please also refer me some basic text that I should read as a beginner to understand these basic concepts in detail?