How to Design Exome Sequencing study?
0
0
Entering edit mode
9.0 years ago

I am planning to analyse exome sequencing data. I am expecting data from the platform Illumina Infinium Human Exome-12 BeadChip.

I am basically statistician and have an interest in bioinformatics. I am designing a case-control study, where 100 cases of diabetes and 100 as control (no diabetes).

I have following questions;

  1. Would 100 cases and 100 controls be sufficient to identify the variants?
  2. Which software would be useful for analysing this type of data or any pipeline for Linux

I would really appreciate if you can help in this regards. Does this study make sense for the given data?

Is there any good reference for this type of analysis?

Sorry for many questions.

UPDATE: Recently, I found that the data is exome sequencing (mean 40x, agilent v4, HiSeq). What would suggest/comment on my above questions about sample size and software?

sequencing genome • 2.7k views
ADD COMMENT
1
Entering edit mode

That's not sequencing data, that's array-based genotyping data. Your sample size is really, really small. I suggest you get acquainted with GWAS literature in general (google it), and then specifically for diabetes (type 1, type 2?) .

ADD REPLY
1
Entering edit mode

Adding to the above comment, you might want to get acquainted with PLINK to analyse this data.

ADD REPLY
0
Entering edit mode

Thanks for your comments. More specifically I am interested in type 2 diabetes. I know the PLINK software, but I have no idea how this array-based genotyping data look like? Is it compatible with PLINK format?

ADD REPLY
0
Entering edit mode

The company will provide you with IDAT files, or hopefully a plink friendly format for you to deal with (I'd explicitly ask for it if I were you) - fam, bed, and bim files. Generally you'd use Genome Studio's genotyping module to perform genotyping from the chips, and get it to spit out a format you can work with.

ADD REPLY
0
Entering edit mode

Today I know from them that it is Exome sequencing (mean 40x, agilent v4, HiSeq). They will give me *.bam and *.vcf files. What would you say that in this situation the sample size of 100 cases and 100 control would be sufficient? Which software would be useful for this purpose?

ADD REPLY
0
Entering edit mode

It is still a relatively small sample size but then you can still give it a try. Just follow the GATK best practice for the variant calling and then perform the statistic analysis using something like the RVTest and SKAT

ADD REPLY

Login before adding your answer.

Traffic: 1844 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6