Hi everyone
My lab collected blood plasma cfRNA samples from breast cancer patients and non-cancer patients as controls. The PI designed a custom gene-chip to sequence 65 genes he predicts will be upregulated in cancer patients. He did this to save cost (cheaper than sequencing entire transcriptome) and to reduce noise (since his previous study showed that cfRNA data can be very noisy).
We now have the data and I'm meant to start analysing it, but I have no idea how to normalise it...
No housekeeping genes were sequenced, and many of the genes are expected to be differentially expressed. This makes TMM, RPKM, and other commonly used methods like DESEq2 inappropriate.
Any idea what I could do?
I thought perhaps to CLR or log transform it and then doing a Welch t-test between the two groups.
Thank you in advance for your feedback.
So it's only 65 genes and all of these are expected to be DE? If so, most terrible possible design. I see no way to analyse this for DE since there is no reference. No matter which test you do, it all comes down to the same question, which is the baseline? You have none, so no analysis can be done. Absolutely terrible design...
RNA-seq these days costs 200$ per sample at commercial providers, I can hardly imagine PI saved costs here.