Batch effect removal
1
0
Entering edit mode
3.2 years ago
BioQueen ▴ 30

Hi! I'm new to bioinformatics and I'm working with 6 different RNA-seq(high throughput) studies from GEO, 3 of the GSEs contain gene expression for tumors and the other 3 contains healthy tissue.

I'm going to do batch correction, and I'm wondering do I merge all datasets together first and do normalization and batch correction on all together? Or do I merge the 3 GSEs for tumor-data and do normalization and batch correction on this merged dataset separately and then merge the 3 GSEs for healthy tissue and do the normalization/batch correction there, and then merge them all together if that make sense?

Thanks!

batch-correction RNA-seq • 2.3k views
ADD COMMENT
0
Entering edit mode

You must make sure that all of the datasets follow same protocol and library preparation steps. Only then you can apply batch normalization. Otherwise your data will give completely unreliable output and sometimes you will never realize that.

ADD REPLY
3
Entering edit mode
3.2 years ago
ATpoint 85k

I'm going to do batch correction

No, you don't. You cannot randomly collect datasets and expect to then run any stats magic and make them comparable. You need indentical wetlabl processing for a fair comparison. Otherwise batch effects obscure the results. You cannot correct it as each batch (=each study) is nested with the condition (tumor/normal). A very common problem, and the only way around is to either find a study that produced case and control in go, or make the data yourself with proper study design. You have with these data above a fully confounded design, nothing you can do about it.

Oh, I see you asked this before and the answer was the same:

Batch effects

Difference between dataset analysis

ADD COMMENT
0
Entering edit mode

But my only option is to merge different studies as I can't find enough samples in one single dataset, how do I solve this?

ADD REPLY
0
Entering edit mode

Is this a pure dry lab project you're working on?

ADD REPLY
0
Entering edit mode

yes, only computational analysis

ADD REPLY
0
Entering edit mode

Yeah, that limits a lot of things unfortunately. I would discuss with your supervisor to limit on something for which proper data exist, and on something that has not been shown before. After all computational analysis is just a starting point, and needs experimental validation. If you start with batch-confounded data it is very likely that you stack up uncertain results and in the end you might just be investigating batch effects rather than genuine biology. There is probably no magical way to make the data you need usable, simply because case and control always come from different batches. It's what it is. Be careful not to spend much time on suboptimal data. Rather try and adjust the topic if possible.

ADD REPLY

Login before adding your answer.

Traffic: 2742 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6