Entering edit mode
7.0 years ago
newscient
▴
20
Hi, I am trying to simulate single-end sequencing reads from a reference genome but uniformly distributed. I have the desired read length and number of reads, I could easily use DWMSIM (https://github.com/nh13/DWGSIM) to generate reads randomly, but i am looking for a tool that would make things easier regarding the wanted uniform coverage of the genome?
Thanks in advance!
what do you mean by "uniform" coverage?
Sampling reads using a uniform distribution sounds better !?
ok just checking to make sure :-) Because using a uniform dist. the coverage at any given site will be Poisson distributed.
I coded gargammel which is a simulator for ancient DNA:
https://grenaud.github.io/gargammel/
it can be used for modern DNA as well though, just remove the ancient DNA idiosyncrasies. It uses ART to simulate seq errors. I know that ART can simulate different coverage. I do not know if ART can add adapters if the fragment length is less than the read length. gargammel does this though. gargammel also allows you to specify desired coverage.
Do you want the probably distribution of read sampling at each position to be a uniform distribution, or do you want uniform coverage across your genome? If you want uniform coverage, then you can't use a random generator. If you want uniform coverage of 50x using 100bp reads then you'll have to generate 1 read every 2 bases; uniform 100x coverage of 100bp reads requires simulating 1 read at each genomic position; and so on.
You can also try
randomreads.sh
from BBMap suite. Check the in-line help for various options.Is the an non-random equivalent? OP appears to want perfectly uniform coverage and to get that you can't use a random sampling strategy.
randomreads.sh
is the name of the program. It has many options to generate simulated data.