Question

Gene expression study

0

Entering edit mode

6.8 years ago

c.chakraborty ▴ 180

Dear all, I am engaged in studying gene expression in forebrain. What I would like to know is if there are specific group of genes like that of RBPs, TFs, cell adhesion molecules, neurotransmitters etc specifically expressed in this region. Therefore I am assimilating data from different parts of the forebrain and studying differential gene expression. Since I am gathering data mostly from sequencing and profiling studies (already published), I need to understand exactly how many samples should I study? Can I compare data sets from two different studies? Since I will be studying and comparing the control (Wt) samples, most data generated from similar platfrom like either Illumina RNA-Seq or Microarray should have similar pattern. I have a fear that if I randomly take two RNA-Seq studies (generated on Illumina platform), there will be a disparity in gene expression. Also, if I study only one data set, the outcome might be biased and not true. How do you think I should approach this problem?

RNA-Seq next-gen statistical significance • 1.5k views

ADD COMMENT • link updated 6.8 years ago by noorpratap.singh ▴ 330 • written 6.8 years ago by c.chakraborty ▴ 180

0

Entering edit mode

Dear c.chakraborty, if you want to do a gene expression meta-analysis eg between case and control you need to select studies that have case and paired control samples and you could not compare cases from one study and controls from different study because of huge noise and batch to batch variation bias. I recommend you to search for differentially expression genes(DEG) in each dataset separately and then merge the obtained significant DEGs by fisher's method.

ADD REPLY • link 6.8 years ago by Shamim Sarhadi ▴ 220

0

Entering edit mode

Dear Shamim, I am not going to use controls from one study and experiment from other. That is not the goal at all! :P Studies so far done have on brain sub-parts, have control data sets (i.e, one which is supposed to be homeostatic- no drug, no gene knock-out). I am interested in using only the control data sets say from two hypothalamic gene expression studies, and two from say cortex, and compare their gene expression. The question is can I use two control data sets obtained from Illumina RNA seq platform from two different groups or should I use only one control data set from one group? If I use both I have to agree, on things like that biases from RNA degradation during actual experiment, efficiency of poly A purification or not, read alignment, and coverage obtained by the transcriptome studies were similar. And that is a dice.

ADD REPLY • link 6.7 years ago by c.chakraborty ▴ 180

score 0 · Answer 1 · 2018-02-24

0

Entering edit mode

6.8 years ago

noorpratap.singh ▴ 330

Directly comparing datasets from two studies might impact interpretation because of individual dataset biases. Usually for these there exist packages that enable meta analyses i.e combining data from platforms. Mtea Seq is one such package. I would urge to you explore on these lines.

ADD COMMENT • link 6.8 years ago by noorpratap.singh ▴ 330

0

Entering edit mode

Hey Noorpratap, Thanks for the idea, I will go through it, read some more and check if I can use this, and what parameters I need to define. Thanks

ADD REPLY • link 6.7 years ago by c.chakraborty ▴ 180