Question

Is there a minimum read count or number of genes for Salmon bias correction?

0

Entering edit mode

3.6 years ago

tgbrooks • 0

I am trying to apply Salmon to a very small (artificial in-silico) genome for testing of an in-development pipeline. I am wondering if there is a limit to bias correction options (particularly --gcbias, --posbias but also --seqbias) in how few reads they need in order to be expected to operate decently? For example, is 100,000 reads too few for them to work? One million reads? I notice that seqbias is documented to use the first million reads: is that a minimum for proper functioning?

Similarly, are there restrictions on the number of distinct genes and/or transcripts needed for these corrections to be meaningful? Would you expect them to operate adequately if only a few dozen transcripts were expressed?

On a related note, is there any output from Salmon about the size of the observed biases and amount of 'correction' applied? I'm interested in values that could be compared between different samples that would indicate how much of a bias is present or how much Salmon was able to do to compensate.

These are stranded, paired-end bulk RNA-seq data if that is relevant.

salmon RNA-Seq transcriptomics • 1.8k views

ADD COMMENT • link updated 3.6 years ago by Rob 7.1k • written 3.6 years ago by tgbrooks • 0

score 6 · Accepted Answer · 2022-02-16

6

Entering edit mode

3.6 years ago

Michael Love ★ 2.6k

I think 1 million is a good idea, because you want to be able to estimate parameters based on rare observations as well, e.g. low and high GC content fragments. I don't know if we did sensitivity analysis on how few fragments are needed.

The fragments should be from a random collection of genes/transcripts, this is pretty important or else the bias will be very skewed/wrong.

You can compare the effective transcript length (which has the bias baked in) and the nominal transcript lengths - mean fragment length (L_txp - mu_FLD). The more deviation, the more bias.

ADD COMMENT • link 3.6 years ago by Michael Love ★ 2.6k

0

Entering edit mode

Thanks Mike! I've copied the question and your answer over on the GitHub Q&A and added some detail there as well!

ADD REPLY • link 3.6 years ago by Rob 7.1k