Entering edit mode
3.2 years ago
BioQueen
▴
30
Hi! I'm going to merge different high throughput RNA-seq datasets, the problem is that the datasets have different amount of genes in each set. For example one have circa 28000 genes and one have circa 35000 genes. How do I best merge these sets? Do I just merge them so that the new merged dataset only have the genes that are in common between the two datasets or is it better to also include the genes that only one of the datasets contain?
I'm going to use it for differential gene expression analysis and for pathway enrichment analysis, and also to find subgroups.
it sounds like you don't have raw data but transcript-level count data? can you clarify what your datasets look like?
I have uniformly normalised gene-level count data from GREIN(GEO RNA-seq Experiment Interactive Navigator), so the data is basically from GEO but processed by GREIN. They provide both raw and normalised count data. So I use the normalised data to do PCA-analysis and heat maps.
In addition to gene-level count data they also have transcript-level count data, what should I choose for my analysis?
I am seeing exactly 214,837 transcripts when I download any run there. Not sure why the web display is being weird.