I don't know if a straight up differential expression analysis would work well here (e.g. some version of a t-test) but you may be able to do a more sophisticated analysis where you can compare your tumor data to your normal data by clustering it. There are many papers where people look for signatures of cancer this way. They usually require a lot of samples.
It is important, if you are going to be directly analyzing the data, to make sure that libraries of the cancer and the comparison normal data are all prepared in the same way. You would need to do a lot of extra normalizing if you want to mix data from a poly A TruSeq protocol with data from, e.g. a protocol that uses RiboZero.
Also, it is important to know that differences in sample handling can introduce big artifacts into the data. Fresh frozen tissue will usually be in better shape than FFPE samples but even then things like how long it took to process the tissue will affect the data. Without good handling the RNA will break up into alphabet soup. Then if you use a poly A protocol you will have a huge 3' bias because the 5' end is not longer joined to the poly A tail. This will show up in the data as length bias when you compare the samples and needs to be normalized out before analysis. This is an issue as the normal tissue is often from deceased donors and obviously it is difficult to just go in and take the tissue.
There are computational ways to smooth out these differences and get meaningful results. There are some in the GTex papers. But it is better to consider these things at the design phase so you can minimize them if possible.
Finally, big numbers are you friend.
I assume that you have already looked through existing RNA Seq datasets to see if the data you need to answer you question already exists. You may also want to look at Oncomine. It also includes a lot of microarray studies and the data is pretty easy to interrogate. The cancer you are looking at might be in there. Existing datasets are also good for telling you how many replicates you are going to need.
Dear Nazanin, Hi.
I guess in this case you need multiple biological replications for normal humans to decrease the bias of individual variation of gene expression (and maybe your next question would be about "to pool or not to pool ?").
The candidate genes that you aimed for may influence your designe, too. If you are searching for up-regulated or down-regulated new gene(s), minimizing individual variation is more important.
You can also check the pipeline of some papers or database to see if their “primary tumor” and “solid tissue normal” were from same individuals or not.
~ Best
Dear Farbod, Hi,
Thank you for your help
Regards
Nazanin