Entering edit mode
7.3 years ago
saamar.rajput
▴
80
I have a RNA-seq raw counts data and I want to test a software for which i want to compare the results of data with noise and data without noise in the counts. For this i need to add some noise into my RNA-seq raw counts data. Can somebody suggest how to add noise into my data at the transcript count level?
Real data contain already a lot of noise, or better, sources of variation, so you will not have data 'without noise" in the first place. If you look at repeated measurements for the same genes, you will see that they are not identical even with technical replicates. If you want to add more noise, it is not granted that the outcome would be in any way similar to a real dataset.
It might make good sense to add noise to simulated read counts. For that you first need to choose a distribution to sample the values from that distribution, e.g. poisson or negative binomial to draw the simulated counts for each replicate, given the parameters of the distribution for each gene. For poisson there is only the central ('mean') parameter lambda, so in R that would be simply calling
rpois(n=10,lambda=1)
to generate 10 replicates of a read count with expected value 1. Note that you cannot 'add' the noise value to a 'real' value like inx' := x+rpois(n=10,lambda=1)
because the resulting variable x' would not be poisson distributed, same for neg binomial.If you want to make it really nice you can draw the parameters mean and dispersion of the distribution from some prior.
What do you suggest then?
I have expanded a bit on the original post, hope it helps.
Instead of using some existing RNA seq data, try simulating it yourself, you can then control the "noise" by parameterisation. Something like Polyester should do that.
... or ART, however these create reads which have to be (pseudo) aligned and counted again. But this is maybe a more realistic setting anyway.