I am currently interested in finding the SNPs that are in the proximity of all the peaks in a Manhattan plot (not only the genome wide significant ones) Basically for each peak, I would like to detect 2, 3 annotated SNPs on both sides(+/- 3 SNP positions) . I have also been looking Find The Extent Of A Peak In A Manhattan Plot which is partially what I am trying to achieve as well.
I have a list of SNPs (200 000)(defined by chromosome and position) and their associated P-value. I have used the data to generate the Manhattan plots. The data is resulted after a Chip experiment with DNA from cancer patients and healthy controls. I have detected the significant variants wich might be associated with cancer, but I am also trying to see if in the vicinity of those SNPs there are other variants with a known association to cancer. I would like to do this for the detected genome wide significant SNPs and for the other observed peaks.
Please add some context. What biological problem are you working on? Manhattan plots can represent many different types of data.
Also, how do you generate the Manhattan plots? Are you talking about Manhattan plots that you have generated by yourself, or plots generated by somebody else?
I have edited my question...I hope this explains more clearly what I'd like to do.
When you mean peaks, these are the SNPs for which you get a significant p-value? And when you mean on both sides, you mean, +/- 3 SNP positions? And what do you mean by detect 2-3 annotated SNPs? If you get a peak (and you know its location), then you already can calculate +/- 3 positions... right?
@Arun - yes I can detect the ones located near the genome wide significant ones because they are generally a few. I dont know what can I do with the other observed peaks.. Well some of the SNPs might be novel, which is why I would like to validate them somehow by using other already known ones located +/- 3 positions near the "interesting" SNP. Also I am not sure if this is the way to do it...
So, basically you have a list of positions on the genome (corresponding to the SNPs that are significant in your study), and you want to know if there are other SNPs in the proximity of these positions. Is this correct?
yes, for the genome wide significant ones; I don't know how to detect the peaks with a lower significance(still lower than 0.001)...how to distinguish them from the non relevant ones...setting thresholds is not a solution I think because it would return too many...
LD, haplotype analysis?
I will do that after I will be able to identify my "peaks"