Can I compare RNAseq data from two different studies statistically?
1
1
Entering edit mode
3.0 years ago
steel1990 ▴ 20

I want to compare clinical Bulk-RNAseq data from two different studies. One has 18 samples of Tumour A, and another 20 samples of tumour Ab (a subcategory of A). Both studies use similar library prep and sequencing platforms. My issue with this is, because the samples come from different studies, does that mean that whatever conclusions I try and draw are essentially confounded, and I cannot be sure that batch effects are not the reason for any DEG?

I am relatively new to bioinformatics so any help/advice at all is greatly appreciated.

sequencing statistics genomics RNAseq RNA • 914 views
ADD COMMENT
3
Entering edit mode
3.0 years ago
Steven Lakin ★ 1.8k

You can perform any analysis or meta-analysis you'd like, as long as you accurately report the limitations and what measures you have taken to mitigate batch effects. First, you want to make sure you normalize your data appropriately to control for differences in sequencing depth. Second, most differential expression statistical software will allow you to specify a multiple regression equation, where you would normally include your experimental factors, for example:

formula = ~ 1 + Treatment + OtherCondition

In this case, you could just encode each sample according to its datasetID and add that into your regression equation to help control for batch effect:

formula = ~ 1 + Treatment + OtherCondition + DatasetID

For software that allow random effects, it might be even better to have DatasetID as a random effect, however with only two datasets, you may be forced to use a fixed effect instead (not enough sample size for estimating a random effect).

However, one difficulty you may run into is if one study has an experimental factor that the other does not, since you can't compare data across studies if one study didn't track that information (at least, not without doing more complex modeling). The best case scenario is if you're just looking to find DEGs, and there are few to no complex experimental features in either dataset.

ADD COMMENT

Login before adding your answer.

Traffic: 2613 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6