Advice for SNP Genotyping data - small dataset n=30 - analysis ideas
0
0
Entering edit mode
12 months ago
sativus ▴ 20

Hi Biostars! I was recently handed a small dataset consisting of Illumina SNP genotyping array data for a dataset consisting of a total of 30 cancer samples which in turn are within a subpopulation. To be more specific, i have access to both RAW and processed data.

For the RAW data, aside from signal intensity .idat files i also have the .ped, .map and cluster files used by the facility when they implemented p-link against hapmap/1k genomes. For processed data, i have the genotyping for each probe's position for each sample in the array (somewhere around 730 000) in the form of a tab separated .txt file (i.e. only information regarding the allele with no statistics). And the summary statistics for each probe within the array.

However, i am at a loss with how to proceed, as the extremely low sample size of each sub-population within the samples (n > 5) makes any kind of statistical analysis useless in regards to adjusted p-values. And from what i have read, a GWAS typically uses hundreds of samples to form associations. Have any of you worked with such small sample sizes when dealing with SNP data before, can anything meaningful even be found for such a cohort, and if so how would you suggest i proceed?

It should be noted that i am new to SNP genotyping, but have taught myself how to use plink, and have a good understanding of R/Python. I have attempted to find inspiration through tutorials and other papers on the matter, but so far have found nothing of use.

plink SNP Genotyping • 336 views
ADD COMMENT

Login before adding your answer.

Traffic: 2747 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6