Analyze a set of sequences to obtain a genotype matrix (SNPs)
1
0
Entering edit mode
10.1 years ago

Hello I have a set of 10 genomic sequences each corresponding to a sample. My final goal is to do Genome wide association analysis between a phenotype that I have for all the samples and the genotype (which is going to be obtained from the SNP data). I have the phenotypic trait value for each sample however I need also the genotypic data (SNPs) for those samples. In other words I would need a SNP matrix of size (10 * number of loci or sites). I'm not familiar at all in identifying or analyzing SNPs from those genomic sequences therefore I would really appreciate it if you could provide me assistance in terms of how would I convert a set of 10 genomic sequences into this genotypic matrix? I would appreciate it if you know a good R package which could assist me in doing this?

Update:

I also have a file of this form:

4 7291472 C G
4 7292641 A C
4 7302012 C T
4 7302344 A T
4 7315419 G C
4 7319414 C T
4 7344281 A C
..
..

Note sure how to use this file when calling for SNPs

SNP • 2.6k views
ADD COMMENT
0
Entering edit mode

It looks like SNPs have already been called and your file is the result of the calling? Just an aside, 10 samples will not result in any meaningful results for "GWAS", as your power is at or near zero. Do you have more samples?

ADD REPLY
0
Entering edit mode

Yes I have more samples. What is confusing to me is how I compute the SNP scores? Ultimately, I want to go through each loci and do association analysis w.r.t the phenotype of interest that I have...

ADD REPLY
0
Entering edit mode

By "SNP scores" do you mean their quality or do you simply mean compiling all of the samples into a single matrix?

ADD REPLY
0
Entering edit mode

Putting all samples into a single matrix...

ADD REPLY
1
Entering edit mode
10.1 years ago
Fotis T ▴ 30

I would advise you to use plink for your study. It is really well-documented and has everything you need. It is not an R package though.

http://pngu.mgh.harvard.edu/~purcell/plink/

Here is a guide on how you need to format the data for the program to accept it (ped and map format):

http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml

I can help you through it, if you need anything more.

ADD COMMENT

Login before adding your answer.

Traffic: 2362 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6