Hypermetric Distribution Of Snps In Pathways
1
2
Entering edit mode
13.5 years ago
Andrea_Bio ★ 2.8k

Hello

I'm really embarassed asking this question since I haven't done statistics since I was at school (and I'm a female so I'm not saying how long ago that was).

I have SNPs for 2 species of cow which have a different phenotype. The SNPs were obtained from a pooled data set of 10 individuals from each species. My theory is that the difference in phenotype arises from a 'gain of function' in one of the species and that the potentially functional SNPs are those that exist in one species and not the other, or those SNPs that are homozygous for one allele in one species and homozygous for another allele in a different species.

So, i have a set of 'pfSNPs' and I have narrowed in down to those SNPs in genes and I intend to do some pathway analysis. I am hoping to see that pathways that feature in the phenotype response will have a high level of pfSNPs compared to other pathways. My question, after all this preamble, is it possible to show that the pathway enrichment for SNPs is statistically significant and not random.

I am thinking I would need some sort of chi-squared test but I don't know how this works at the pathway level. Would I would compare the number of SNPs in my enriched pathway to the number of SNPs in all of the other pathways to show that a higher number of SNPs in this pathway has a very low probability of happening by chance? My confusion is that not all pathways are created equal in that some will have more genes than others and are more likely to have SNPs. Can you factor in the number of genes in the pathway? Or is that not necessary because using that logic you could say that genes with longer exons/introns are more likely to have SNPs as they have more 'DNA coverage'?

I'm also aware that I don't have a large number of individuals in the initial samples.

Thanks a lot for your help

snp pathway statistics • 3.8k views
ADD COMMENT
4
Entering edit mode
13.5 years ago

You don't have to invent this analysis yourself (although it never hurts to try). Several groups have looked at using genotypes to do pathway analysis. Check the Nature Reviews: Genetics paper by Wang. et al. (Analysing biological pathways in genome-wide association studies) for a review. The most common approach seems to be a variation on Gene Set Enrichment Analysis for SNPs; you convert your SNPs to candidate genes and then do a GSEA. There are a lot of details to consider: how do you assign a gene to a SNP (or do you avoid that entirely), do you work from raw genotypes or P values, what test do you use to identify enrichment, do you threshold the P values for your initial analysis, etc.

Unfortunately I suspect the data you describe are underpowered for this analysis, but that's usually the case.

ADD COMMENT
0
Entering edit mode

Thanks for your answer. I am not doing GWAS studies so I don't have any P values or a problem of assigning a SNP to a gene (the SNP is either in it or very close to be in the gene). Is the GSEA enrichment technique still relevant as it doesn't take into account the fact that a gene with multiple SNPs enriches a pathway more than a gene with one SNP

ADD REPLY
0
Entering edit mode

Thanks for your answer. I'm not doing GWAS so I don't have any p values. In brief, is this approach suggesting that you:

a) get all pathways and get all of the genes in each pathway so that each pathway is a gene set;
b) find the number of SNPs in each gene set;
c) use a test to see which gene sets (pathways) are statistically enriched for SNPs?

ADD REPLY
0
Entering edit mode

Forgot to add that, presumably you have a '# SNPs in gene set' figure for each each individual in the case/control populations. With a pooled DNA sample I will only have one individual in the case and control populations and even that won't be truly representative of any one individual.

ADD REPLY
0
Entering edit mode

I should also add in my question I was only looking at pathways in one 'population' and not comparing a case population to a control population. In other words, I'm trying to ask, in this population which have a specific phenotype, are any of the pathways enriched for pfSNPs compared to other pathways in the same population. I'm not saying 'are any pathways enriched in a case population wrt a control population.' Hope that makes sense

ADD REPLY
0
Entering edit mode

Also, I am only looking at pathways in one 'population' and not comparing a case population to a control population. In other words, I'm trying to ask, in this population which has a specific phenotype, are any of the pathways enriched for pfSNPs compared to other pathways in the same population. I'm not saying 'are any pathways enriched in a case population wrt a control population.' Also, with pooled DNA I only have one individual in the population and that isn't even representative of a true individual

ADD REPLY
0
Entering edit mode

Thanks for your reply. as I suspected and you confirmed, I don't have the data for this. But I could do a distribution of the number of pfSNPs in genes or a distribution of the number of pfSNPs in pwathways. This distribution could be an indicator of places to look more closely?

ADD REPLY

Login before adding your answer.

Traffic: 1961 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6