I'm not a bioinformatician by profession but this is a field that I'd like to learn on my own time, and who knows, maybe eventually making some useful findings.
I currently have SNP data for a full genome on about 150 individuals, along with a quantitative description of a phenotype.
How would I go about predicting the phenotype from a completely new set of individuals(for whom I have complete SNP data), given the previous information.
I downloaded PLINK and I'm currently formatting my data using python to agree with it. How can I use this (or another) tool to accomplish my goal? If the phenotype were say, height, I'd want to know which of my new individuals would end up being tall, and who would be short, etc. Ideally I'd want to rank them from tallest to shortest.
Links to explicit directions would be highly appreciated.
EDIT: Height was just an example of a quantitative/continuous phenotype. I'm not looking for height specifically.
good luck ! :-)
Are you looking at a GWAS experiment ? Is this 150 total number of cases or controls or both ? What do you mean by 'new individual' ?
Yes, this would be GWAS data. For the original set of 150, they're not cases or controls as I understand the terms. Cases and controls are applied to binary phenotypes, ie. you either have the disease or you don't.
The phenotype I have is continuous, and could be real number from, say 0 to 10.
For "new individualss", I mean a new set of persons for whom we have full SNP data, but for whom we don't have information on the phenotype.
Can I use information gained from the original set to predict phenotype in the "new individuals"
Is this clear?
Sounds like an exciting project :) ! AFAIK, you should have a case and control (not only from the perspective of diseases) for example in case of height you can have set of cases(height x) and controls(height y) to derive a p-value for the genotype-phenotype association.
Sounds like an exciting project :) ! I think your question is in 2 parts. 1) You have genotype and phenotype information and 2) You need to analyze the association and use the information from the association for a prediction. Please let me know if I got it right or not ?
Yes, your interpretation is correct. Any suggestions on how to tackle part 1 or 2 in your comment? This is my first crack at any bioinformatics.