NGS reads simulation
2
9
Entering edit mode
9.8 years ago
wangyi2412 ▴ 240

Hi, everyone!

I am studying the performance of my algorithm, where I need simulation. I looked up the ones used by 1000genomes, but people there said it was outdated, and suggested finding a new one.

I overheard some software called ART, but cannot find it on web.

I also read some similar paper which used simulation, but all of them did not point out what existing software or data sets they used.

Beside the software or datasets to do the simulations in box, I want to know more about the details of and the principle behind the simulation.

The settings are like this:

  1. First construct the diploid of a human(only consider SNPs/indels, not including other type of variations)
  2. Generate templates with Gussian distributed length and coming with equal prob from the 4 strand of DNA(+/- strand of two homologous chromosomes, with the error rate similar with that of the sequencing machine like illumina hiseq 2000
  3. Get 100 bp reads from each template.

The key is how to construct the diploid of a human so that it best resemble a "typical" person in a population in study. Anyone has any idea? Randomly select of a bp to be different from the ref with the prob. of the mutation rate, say 1%? But the mutation rate should be different on different regions, so how to simulate this scenario? Or to the aim of the study, as long as the simulation is not for study depending on the distribution of the variations, this could be omitted?

Thank you very much!

Yi

sequencing simulation • 13k views
ADD COMMENT
28
Entering edit mode
9.8 years ago
Felix Francis ▴ 600

Here is a list of genetic simulation resources. I believe there are several ones that would suit your needs.

EDIT by @RamRS: I moved the content to a GitHub Gist on 06-Apr-2022. Please let me know if you'd like to edit. In fact, feel free to copy the raw content from the Gist to your own gist and replace the link in this post.

ADD COMMENT
1
Entering edit mode

Whoa! That's one gigantic list! Goes to show the richness in just any subdomain of bioinformatics.

ADD REPLY
1
Entering edit mode

5 years later, this is still a very impressive list. I have nothing to add but a (unfortunately not so recent) paper that might be of use as additional reference

https://dx.doi.org/10.1038%2Fnrg.2016.57

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5224698/

Originally posted by @Joseph Hughes

ADD REPLY
0
Entering edit mode

BBMap's RandomReads: Generates single-ended or paired Illumina reads, or PacBio reads, from a genome. Also has a metagenome mode.

ADD REPLY
3
Entering edit mode
9.8 years ago
rtliu ★ 2.2k

Try read simulators in omictools.com including ART, wgsim etc.

ADD COMMENT

Login before adding your answer.

Traffic: 2518 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6