Joining RNASeq librairies from different experiments
2
0
Entering edit mode
8.9 years ago
Joel TM ▴ 60

Good day to all, I am looking for insights about how to approach my issue. I am sure some of you have gone through this step at some point or perhaps there are related posts here I couldn't find that you know about.

I have libraries from Lung tumor samples that have between 100M-200M reads each. I want to test for differential expression with normal/healthy lung samples. I found a public RNAseq data for the latter but it comprises of libraries of between 15M-20M reads.

Would that kind of analysis/comparison be reliable at all? If so, what is the best way to approach this?

Thank you for the mentorship,

Regards,
Joel

RNA-Seq differential-expression Normalization • 1.8k views
ADD COMMENT
2
Entering edit mode
8.8 years ago

As Goutham states, if it's just a library size issue, then most sequencing normalisation methods account for that (see sizeFactors in DESeq2's manual for more information). However, it's not normally that simple when combining data from different experiments, often there are differences in chemistry, sample prep, instrument, day of the week, temperature in the room, etc which add variation, often known as 'batch effects'.

Providing that your samples across experiment are of the same type, i.e. in experiment A you have healthy / tumour samples, and in experiment B you have healthy tumour samples, then you can account for that variation using an additive model. Even with just tumour or healthy samples in A or B, you could block by experiment and make the variance estimation, but not as reliably.

Basically all this comes down to how you design your model. I'd recommend the DESeq2 workflow -> Align with your favourite splice aware aligner, Count using htSeq_Count or RSubRead, then follow the DESeq2 vignette

ADD COMMENT
0
Entering edit mode

Thank you for your time to the both of you. That's exactly what I did; I was on the right path. I wasn't sure at all though if anything else needed to be done in order to make the two experiments "compatible". Thanks again

ADD REPLY
1
Entering edit mode
8.8 years ago

The normalisation methods accounts for the variation in read depth ( library size ) and works pretty well up to 10 fold difference. But you can always check the clustering or PCA plots to have an idea how the samples look. I am not sure about other artefacts likes batch effects.

To put in another way, if you are concerned only about different library sizes, it will be taken care by normalisation methods.

ADD COMMENT

Login before adding your answer.

Traffic: 2775 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6