Hi,
I am comparing 1000bp upstream sequences of TSS amongst three species (without replicates). In order to compare the upstream 1000bp with the KA/Ks of the gene (pairwise comparison between species) I had normalized the 1000 bp upstream region using the rpkm formula.
I was wondering if this is the best normalization measure I can use in the absence of any replicates or are there other normalization methods I should try and what are they?
Regards
How do you get scaling factors from non-peak regions can you please elaborate. Also TMM/RLE and quantile normalization require replicates AFAIK.
PS :I did not have chip input samples
Replicates aren't needed or even used in TMM, RLE, or quantile normalization. All you need are multiple samples, which can be the ones being compared.
To get scaling factors from non-peak regions, you would first call peaks and then use the counts outside of them for the normalization.
Since you lack inputs, you're going to have more work on your hands when it comes to validation.
So in my case I don't have multiple samples in one species I have different histone modifications in 3 species (for a particular tissue) that I am trying to compare here.
Hi Devon,
Do you have something to elaborate on my question.
thanks
Not unless you'd like me to elaborate on something in particular.
So I have only one sample for each species. What in that case would be the most correct way to normalize
CPM or TMM
Is it possible to use TMM on such data and how?
Sure:
Alternatively, play around with scale factors until things look right.
So what you are saying that I should use the peak regions as sample1 and non-peak regions as sample2 for doing TMM normalization?
No, you would use the non-peak regions for all of the samples. TMM would give you scaling factors accordingly that you would then need to apply to the peak counts. This is the same as how ERCC spike-ins are used in RNAseq.
Sorry I am still not able to follow.I have only one sample for each histone modification in one species. So if you are saying I need to use non-peak regions for all of the samples do you mean I should combine non-peak regions for all different histone modifications? Can you point me to an example?
Ah, right, I'd forgotten the context of your question. You'll just have to play around with TMM a bit if you want to use it. There are no examples of this that I'm aware of, but the general idea would be to ignore the fact that your non-peak regions are in different areas for each sample and to just use counts in some fixed number of them per sample. I don't have the time to put together a long example of this, unfortunately.