Question

Inserting Noise into RNA-seq raw counts

0

Entering edit mode

7.2 years ago

saamar.rajput ▴ 80

I have a RNA-seq raw counts data and I want to test a software for which i want to compare the results of data with noise and data without noise in the counts. For this i need to add some noise into my RNA-seq raw counts data. Can somebody suggest how to add noise into my data at the transcript count level?

RNA-Seq rna-seq next-gen • 2.7k views

ADD COMMENT • link 7.2 years ago by saamar.rajput ▴ 80

1

Entering edit mode

Real data contain already a lot of noise, or better, sources of variation, so you will not have data 'without noise" in the first place. If you look at repeated measurements for the same genes, you will see that they are not identical even with technical replicates. If you want to add more noise, it is not granted that the outcome would be in any way similar to a real dataset.

It might make good sense to add noise to simulated read counts. For that you first need to choose a distribution to sample the values from that distribution, e.g. poisson or negative binomial to draw the simulated counts for each replicate, given the parameters of the distribution for each gene. For poisson there is only the central ('mean') parameter lambda, so in R that would be simply calling rpois(n=10,lambda=1) to generate 10 replicates of a read count with expected value 1. Note that you cannot 'add' the noise value to a 'real' value like in x' := x+rpois(n=10,lambda=1) because the resulting variable x' would not be poisson distributed, same for neg binomial.

If you want to make it really nice you can draw the parameters mean and dispersion of the distribution from some prior.

ADD REPLY • link 7.2 years ago by Michael 55k

0

Entering edit mode

What do you suggest then?

ADD REPLY • link 7.2 years ago by saamar.rajput ▴ 80

0

Entering edit mode

I have expanded a bit on the original post, hope it helps.

ADD REPLY • link 7.2 years ago by Michael 55k

0

Entering edit mode

Instead of using some existing RNA seq data, try simulating it yourself, you can then control the "noise" by parameterisation. Something like Polyester should do that.

ADD REPLY • link 7.2 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

... or ART, however these create reads which have to be (pseudo) aligned and counted again. But this is maybe a more realistic setting anyway.

ADD REPLY • link 7.2 years ago by Michael 55k