Entering edit mode
9 months ago
Mo
▴
50
Hi,
I have RNA sequencing data from 5 independent experiments. There are ~200 genes that I am interested in comparing between these 5 experiments. I want to make bar graphs of the same genes from different experiments and compare them if they are up or downregulated. RNA seq data is already normalised by the genomic DNA.
The problem I am having is in one experiment data values looks like:
gene 1: 0.02
gene 2: 0.05
gene 3: 0.10
In the second they look like this:
gene 1: 3.5
gene 2: 5.8
gene 3: 10.6
My question is how can I normalise all these data from each experiment to the scale of 0.0 to 1.0. So this way I can have the same scale for all genes between experiments.
Many thanks,
have you performed QC and exploratory data visualization to examine the gene counts from these 5 independent experiments? This is necessary before performing any data integration steps. It'll also be important to have the raw sequencing data processed using the same pipeline. Is that the case? Even then there will likely be batch effects to account for, have you looked into this? You should not move on to data integration before addressing these aspects first.
Yes, I have completed all those steps. I have base called the data, filtered the reads after fastQC, mapped to reference database, retained the primary alignments with MAPQ > 10, visualised, quantified the reads and then normalised with genomic DNA. At this stage I just want to normalise the values on the scale of 0 and 1. Thanks
so there are no batch effects present in the data?
What do you mean by 'normalizing with genomic DNA'?
At the moment I can't think of a reason why you need to scale if you've already normalized by sequencing depth.
Regarding scaling (i.e. setting range of values btwn 0-1) what have you tried? There are sufficient resources online for this procedure.
More importantly what is your downstream analysis? Is this for visualization? If so, i typically convert gene expression to z-score and this is calculated individually for each gene using the mean and sd of that gene's expression across the samples.
So these RNAs are not from endogenous genes, but rather from plasmid libraries expressed in human cells. I have separately sequenced this plasmid library on the nanopore (as DNA) to get the copy number of each gene in it. By "normalizing with genomic DNA", I meant that I have normalised raw mRNA read counts (nanopore mRNA sequencing) for these genes by the plasmid copy numbers (I have divided the mRNA copy numbers by the plasmid copy numbers).
I have not tried any online resources as I was really confused, about how to do this properly, hence I am looking for advice.
It is just for visualisation, I want to check if each gene has different expression levels in independent experiments.
I am realizing now that when you said "5 independent experiments" you likely meant replicate sample not 5 separate, independently collected data sets that you are downloading from the public repositories. Hence all I my questions about processing and batch effects.