Question

Advice for SNP Genotyping data - small dataset n=30 - analysis ideas

0

Entering edit mode

12 months ago

sativus ▴ 20

Hi Biostars! I was recently handed a small dataset consisting of Illumina SNP genotyping array data for a dataset consisting of a total of 30 cancer samples which in turn are within a subpopulation. To be more specific, i have access to both RAW and processed data.

For the RAW data, aside from signal intensity .idat files i also have the .ped, .map and cluster files used by the facility when they implemented p-link against hapmap/1k genomes. For processed data, i have the genotyping for each probe's position for each sample in the array (somewhere around 730 000) in the form of a tab separated .txt file (i.e. only information regarding the allele with no statistics). And the summary statistics for each probe within the array.

However, i am at a loss with how to proceed, as the extremely low sample size of each sub-population within the samples (n > 5) makes any kind of statistical analysis useless in regards to adjusted p-values. And from what i have read, a GWAS typically uses hundreds of samples to form associations. Have any of you worked with such small sample sizes when dealing with SNP data before, can anything meaningful even be found for such a cohort, and if so how would you suggest i proceed?

It should be noted that i am new to SNP genotyping, but have taught myself how to use plink, and have a good understanding of R/Python. I have attempted to find inspiration through tutorials and other papers on the matter, but so far have found nothing of use.

plink SNP Genotyping • 336 views

ADD COMMENT • link updated 12 months ago by zx8754 12k • written 12 months ago by sativus ▴ 20