I am trying to apply Salmon to a very small (artificial in-silico) genome for testing of an in-development pipeline. I am wondering if there is a limit to bias correction options (particularly --gcbias, --posbias but also --seqbias) in how few reads they need in order to be expected to operate decently? For example, is 100,000 reads too few for them to work? One million reads? I notice that seqbias is documented to use the first million reads: is that a minimum for proper functioning?
Similarly, are there restrictions on the number of distinct genes and/or transcripts needed for these corrections to be meaningful? Would you expect them to operate adequately if only a few dozen transcripts were expressed?
On a related note, is there any output from Salmon about the size of the observed biases and amount of 'correction' applied? I'm interested in values that could be compared between different samples that would indicate how much of a bias is present or how much Salmon was able to do to compensate.
These are stranded, paired-end bulk RNA-seq data if that is relevant.
Thanks Mike! I've copied the question and your answer over on the GitHub Q&A and added some detail there as well!