Combining the RNAseq datasets
1
0
Entering edit mode
7.6 years ago
prp291 ▴ 70

I have a RNAseq experiment data (50bp) into two tissue. I want to combine it with previously reported RNAseq data (100bp) to analyze the gene expression in different tissue. Can I combine both data together or I should go through some kind of normalization? Any insight will be helpful. Thanks

RNA-Seq next-gen • 5.5k views
ADD COMMENT
0
Entering edit mode

Thanks for the links. I think situation is little bit different in my case. In my case both datasets are using the technique (RNAseq) but read length is different. I was wondering whether difference in read length will me any impact on final result?

ADD REPLY
0
Entering edit mode

The read length impact won't generally be huge...and you can just trim the 100 base reads down to 50 anyway. The bigger issue is that you have a batch effect in sample preparation and library prep. (quite possibly including the type of kits use). You're not going to normalize that away without a good bit of background information.

ADD REPLY
0
Entering edit mode

The 100bp set may have more "power" to it than the 50bp set, but this also depends on the quality of the reference genome and alignment quality. Did you see quality differences between the two for any justification of this? Did you use a high quality draft genome... I assume your using human tissue??

ADD REPLY
0
Entering edit mode

Thanks for comments. I am using a plant genome. Both 50bp and 100bp showed more than 70 % alignment rate.

ADD REPLY
0
Entering edit mode

How did you proceed with this? I may be doing something similar and would like to know your experience.

ADD REPLY
0
Entering edit mode
7.4 years ago
theobroma22 ★ 1.2k

I would recommend using SPIA on both sets of genes, and if you get the Entrez IDs you should get homologous pathways between the two. One way to do this is if you have Blast+ locally, you can format your output, parse the NCBI Gene ID and then merge this file with the reference sequence file to get the Entrez ID, the file is obtainable from the NCBI REFSEQ website. It could be that the pathways common in both sets are relevant to the studied biology.

ADD COMMENT
0
Entering edit mode

I think this answer was not intended for this question.

ADD REPLY
0
Entering edit mode

Hi prp291,

The 100bp set has already been published, so what do you intend to do with the data?

ADD REPLY
0
Entering edit mode

actually I want to identify the tissue specific genes in my plants. 100bp and 50bp datasets cover different kind of tissue. So my goal is to mixed of both datasets and then identify the tissue specific genes. Thanks.

ADD REPLY
0
Entering edit mode

Ok, I kind of misunderstood your original post, sorry. Like Devon Ryan said above you will have batch effects, so my initial thinking was also that it would be difficult to compare them without accounting for those effects or without going beyond gene expression trend analysis by mapping the different sets of genes using SPIA to determine common pathways. You want tissue specific genes, so it seems you are limited to your two 50bp sets if you cannot account for the batch effects.

ADD REPLY

Login before adding your answer.

Traffic: 1640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6