Computer Scientist exploring ways to determine degree of association between disease and SNP, what tools/methods should I look into?
1
0
Entering edit mode
10.0 years ago

Hi,

I'm a Computer Science postgraduate new to the area of bioinformatics. I have been given a dataset of a couple of SNPs with 600 subjects. I've a quantitive value, measuring disease severity.

What tools should I use to see if there is a correlation?

I'd really appreciate your suggestions.

SNP R • 5.1k views
ADD COMMENT
0
Entering edit mode

Ummm, what do you mean when you say you have quantitative value? And, what data do you have on the SNPs exactly? The alleles involved? The frequencies they are seen in?

ADD REPLY
0
Entering edit mode

Thanks for getting back to me.

Let's say I have a value between 0 and 200 that represents the severity of the disease, and I have this value for each of the 600 subjects.

Then for the SNPs, I've the alleles for each subject.

ADD REPLY
0
Entering edit mode

Alright, is the reference allele given to you or is it just this data (Subject, Allele, Disease-Score) from which you're supposed to infer that?

ADD REPLY
0
Entering edit mode

Can you describe what you mean, by reference allele?

ADD REPLY
0
Entering edit mode

It's the allele found in the reference genome at the exact same position as your Single Nucleotide Variant.

ADD REPLY
0
Entering edit mode

Yes, I believe so.

ADD REPLY
0
Entering edit mode
10.0 years ago
Ram 44k

So you can ideally compare the number of people with each allele and also tabulate stats on the distribution of disease scores among each allele, stating which allele you conclude is the disease causing allele and what kind of mutation (transition/transversion) it is.

Also, to spice stuff up, maybe add some ANOVA into the mix? :)

ADD COMMENT
1
Entering edit mode

I'm moving this to an answer, an ANOVA (or linear model more generally) is the simplest way to handle data like this.

@davidswordster: Keep in mind that we're assuming the 0-200 severity scale units represents relatively evenly spaced increases in severity. While this is very likely the case, you might want to double check with the people who generated the data to ensure this (if the scale were more like 0-7 then you'd need to use ordered logistic regression instead).

ADD REPLY
0
Entering edit mode

Thank you so much for your answers. I'm looking into these as we speak.

Just one other thing, and it was something that was mentioned to me, but not elaborated to me before, I've studied Hardy-Weinberg and it seems pretty straightforward, but something has confused me about the suggestion. I was asked to generate p-values for each SNP, which is no problem, but the way that it was suggested to me denoted that there would be some indication of interesting SNPs based off of this. Is there a way to determine SNPs of interest based on the dominant allele frequency?

ADD REPLY
0
Entering edit mode

I'm not sure how that would work given how your dataset is structured. In general, a population with a mendelian disorder caused by a given variant should show a violation of Hardy-Weinberg at that variant position. Presumably that's what was being implied, however applying that principle to a highly graded disorder is probably not a very fruitful avenue to pursue.

ADD REPLY
0
Entering edit mode

Well, I was hoping to perform a non-parametric trend test on these. Any thoughts?

Sorry for all the questions, it is out of term time, and I would have to wait until Monday/Tuesday to get a response from my post-doc or supervisor.

ADD REPLY
0
Entering edit mode

Nothing comes to mind, sorry.

ADD REPLY

Login before adding your answer.

Traffic: 1667 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6