My work involves identification of diferentially expressed genes within 3 conditions using microarray data. I had planned to use existing data from GEO for this purpose. For microarray, in most datasets all the 3 conditions were present in the same experiment and was pretty good to use for my purpose. Now I read that RNA-seq has an advantage over microarray for the detection of DEG. But all the 3 conditions are rarely present within the same experiment. An experiment has 2 of 3 conditions. The experiments vary over instrument and library preparation methods. Can anyone suggest on how can I plan my further work if I want to compare between microarray data and RNAseq DEG or rather shift entirely to RNAseq. Also for the most papers I referred I saw the group uses their own samples. Prepare all the conditions themselves and sequence it. Is it a good idea to work with existing RNAseq data?
I don't think I agree with this. Library prep is likely to have a big impact, especially if you compare ribo depletion with polyA sequencing. But even within the same strategy I expect biases.
Including this in the design model might work, but I bet that the biological subgroups will be confounded / not independent from these technical subgroups.
But if you some data to back your answer, please share.
True, but the OP should therefore elaborate on the specific library prep methods in the RNA-seq samples of interest. I made an assumption that these library prep methods were just different versions of the same kit and/or were targeting the same RNA species. I made this assumption because I had initially assumed that it was obvious that library prep methods targeting different RNA species would not be compatible.
I do have valid data that shows how the inclusion of sequencer instrument, library prep method (all ribosome depletion), and read type (single/paired-ends) in the design model can remove these effects.
Dear Kevin,
I did an initial attempt with HiSat2, StringTie and Ballgown. StringTie gives values as FPKM or TPM. If I understand these are normalised values. Do I need to do any further normalisation?
As for the library preparation I mean the various kits used like TruSeq, Universal kit etc.
Can you share some paper with this sort of work.
To do this, my advice is to not use HISAT2 and to not use FPKM / TPM. If you follow my approach (above), your life will be a lot easier. Both FPKM and TPM normalistaion strategies have come under much criticism in recent years and many avoid them. They are certainly not ideal for what you wan to do.