Question

Using Simngs To Generate Reads

4

Entering edit mode

13.9 years ago

Travis ★ 2.9k

Hi,

Does anyone have experience of using simNGS (http://www.ebi.ac.uk/goldman-srv/simNGS/) to generate simulated Illumina reads?

I am attempting to generate a set of 5 million test reads using the human genome as input. The program seems to only generate one read (or read pair) per sequence i.e. chromosome. I have tried to find some means of altering the parameters to get more reads but I have been unsuccessful.

Can anyone advise?

next-gen sequencing read simulation illumina • 5.0k views

ADD COMMENT • link updated 11.9 years ago by guillemch ▴ 140 • written 13.9 years ago by Travis ★ 2.9k

score 8 · Answer 1 · 2011-06-02

8

Entering edit mode

13.9 years ago

Botond Sipos ★ 1.7k

The simNGS packages comes with two binaries:

simLibrary, which simulates library construction. It takes as input the reference genome and outputs fragments with size distribution specified by the command line parameters (or defaults).
simNGS, which simulates sequencing and basecalling.It takes the fragments as input and in paired-end mode generates two reads from the ends of the fragment.

If you give as input a genome as a fasta file to simNGS, than it is expected to give you one read pair per chromosome, as it will interpret the chromosomes as fragments.

What you need is something like this:

simLibrary -n [number_of_fragments] reference.fas | simNGS -o fastq -p paired [runfile] > simulated_reads.fq

I would recommend reading the full documentation of simLibrary and simNGS before using them just to make sure that you get the output you expect.

ADD COMMENT • link 13.9 years ago by Botond Sipos ★ 1.7k

1

Entering edit mode

Thanks a lot. I will give the documentation a closer look - I've just been very short on time.

ADD REPLY • link 13.9 years ago by Travis ★ 2.9k

0

Entering edit mode

Is there any means of ensuring that the read names output are unique? I'm running into problems downstream with non-unique read names...

ADD REPLY • link 13.9 years ago by Travis ★ 2.9k

0

Entering edit mode

I guess your problem is that the reads having identical end points have the same ID. Unfortunately with the current release there is no way to guarantee the uniqueness of the IDs. But it is easy to post-process the output to make them unique, as the reads from the same pair are printed out consecutively.

ADD REPLY • link 13.9 years ago by Botond Sipos ★ 1.7k

0

Entering edit mode

No problem - I can fix it manually!

ADD REPLY • link 13.9 years ago by Travis ★ 2.9k

score 1 · Answer 2 · 2011-06-02

1

Entering edit mode

13.9 years ago

Benm ▴ 710

You can try this one, https://sourceforge.net/projects/simulateseq/files/0.2.2/

ADD COMMENT • link 13.9 years ago by Benm ▴ 710

score 0 · Answer 3 · 2013-05-13

Hi!

It has been a long time since this thread was opened, but I arrived here because I was facing the same problem. I was having problems due to non-unique read IDs when the organism from which I was generating the reads had more than one chromosome. I've made a patch to fix this (here), you can apply it on the last version of simNGS.

Hope it helps!