Simulation of single-end reads from reference genome
1
1
Entering edit mode
7.0 years ago
newscient ▴ 20

Hi, I am trying to simulate single-end sequencing reads from a reference genome but uniformly distributed. I have the desired read length and number of reads, I could easily use DWMSIM (https://github.com/nh13/DWGSIM) to generate reads randomly, but i am looking for a tool that would make things easier regarding the wanted uniform coverage of the genome?

Thanks in advance!

simulation reads • 2.3k views
ADD COMMENT
0
Entering edit mode

what do you mean by "uniform" coverage?

ADD REPLY
0
Entering edit mode

Sampling reads using a uniform distribution sounds better !?

ADD REPLY
1
Entering edit mode

ok just checking to make sure :-) Because using a uniform dist. the coverage at any given site will be Poisson distributed.

I coded gargammel which is a simulator for ancient DNA:

https://grenaud.github.io/gargammel/

it can be used for modern DNA as well though, just remove the ancient DNA idiosyncrasies. It uses ART to simulate seq errors. I know that ART can simulate different coverage. I do not know if ART can add adapters if the fragment length is less than the read length. gargammel does this though. gargammel also allows you to specify desired coverage.

ADD REPLY
1
Entering edit mode

Do you want the probably distribution of read sampling at each position to be a uniform distribution, or do you want uniform coverage across your genome? If you want uniform coverage, then you can't use a random generator. If you want uniform coverage of 50x using 100bp reads then you'll have to generate 1 read every 2 bases; uniform 100x coverage of 100bp reads requires simulating 1 read at each genomic position; and so on.

ADD REPLY
0
Entering edit mode

You can also try randomreads.sh from BBMap suite. Check the in-line help for various options.

ADD REPLY
0
Entering edit mode

Is the an non-random equivalent? OP appears to want perfectly uniform coverage and to get that you can't use a random sampling strategy.

ADD REPLY
0
Entering edit mode

randomreads.sh is the name of the program. It has many options to generate simulated data.

ADD REPLY
1
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 2542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6