Merging of RNA-seq datasets from different studies

1

Entering edit mode

3.2 years ago

BioQueen ▴ 30

Hi! I'm going to merge different high throughput RNA-seq datasets, the problem is that the datasets have different amount of genes in each set. For example one have circa 28000 genes and one have circa 35000 genes. How do I best merge these sets? Do I just merge them so that the new merged dataset only have the genes that are in common between the two datasets or is it better to also include the genes that only one of the datasets contain?

I'm going to use it for differential gene expression analysis and for pathway enrichment analysis, and also to find subgroups.

RNA-seq merging • 1.7k views

ADD COMMENT • link updated 3.2 years ago by Jeremy Leipzig 22k • written 3.2 years ago by BioQueen ▴ 30

1

Entering edit mode

it sounds like you don't have raw data but transcript-level count data? can you clarify what your datasets look like?

ADD REPLY • link 3.2 years ago by Jeremy Leipzig 22k

0

Entering edit mode

I have uniformly normalised gene-level count data from GREIN(GEO RNA-seq Experiment Interactive Navigator), so the data is basically from GEO but processed by GREIN. They provide both raw and normalised count data. So I use the normalised data to do PCA-analysis and heat maps.

In addition to gene-level count data they also have transcript-level count data, what should I choose for my analysis?

ADD REPLY • link 3.2 years ago by BioQueen ▴ 30

1

Entering edit mode

I am seeing exactly 214,837 transcripts when I download any run there. Not sure why the web display is being weird.

ADD REPLY • link 3.2 years ago by Jeremy Leipzig 22k

Login before adding your answer.