I have scRNA data. I pretend to contextualise my data with bulk RNA, i.e., merge scRNA with bulk RNA. My idea is to comprise all my cells from scRNA into a single point, and then merge that data with bulk RNA data.
I've done the first step by calculating the mean of all the cells from scRNA, resulting in a single point.
I have gathered bulk RNA, as FASTQ data, from a database. Then aligned, and extracted the counts through RSEM.
My problem is that the data don't match up. This is because scRNA uses UMI, which approaches gene counting differently from RSEM's expected or TPM counts.
I don't know how to normalise the two types of data to be able to compare them. I don't even know if it's possible.
If someone could help me, or just give me a tip, that would be great!
Cos scRNASeq and bulk RNASeq differs from the very beginning, and it's highly expected that the batch effect is more than the biological effect. And, there is no way to distinguish batch effect from the biological effect in this comparison.
Not an expert in scRNA-seq, still my two cents: lol, 5 years later, by now I am an expert in scRNA-seq, and I still stand by my answer.
scRNA-seq is typically 3'-end enriched while bulk RNA-seq is fragmentated full transcript/cDNA. You would also need to ensure that the downloaded data are single-end as scRNA (at least 10x) is effectively single-end.
There is a strong confounding effect between your and downloaded data as different specimen, labs, kits, sequencing regimes were used. If you download a couple of independent RNA-seq datasets (bulk) for the same cell type but different kits, and perform PCA, you will see that they notably cluster by dataset rather than cell type. At least this is my experience. The effect is probably even stronger between bulk and scRNA-seq.
scRNA-seq is zero-inflated, while bulk RNA-seq, at normal depth, is not.
Bioinformatic processing is notthe same.
You better do some kind of meta-analysis like checking if prominent genes that come from your scRNA are also present in bulk, or are DEG in bulk between two conditions. I do not see a reasonable way on comparing counts between bulk and scRNA.
Sometimes I just don't understand why people think they can normalize everything...
While I agree with the statement, it would be more productive to explain why this is not possible here ;-)
Cos scRNASeq and bulk RNASeq differs from the very beginning, and it's highly expected that the batch effect is more than the biological effect. And, there is no way to distinguish batch effect from the biological effect in this comparison.