Using Simngs To Generate Reads
3
4
Entering edit mode
13.5 years ago
Travis ★ 2.8k

Hi,

Does anyone have experience of using simNGS (http://www.ebi.ac.uk/goldman-srv/simNGS/) to generate simulated Illumina reads?

I am attempting to generate a set of 5 million test reads using the human genome as input. The program seems to only generate one read (or read pair) per sequence i.e. chromosome. I have tried to find some means of altering the parameters to get more reads but I have been unsuccessful.

Can anyone advise?

next-gen sequencing read simulation illumina • 4.7k views
ADD COMMENT
8
Entering edit mode
13.5 years ago
Botond Sipos ★ 1.7k

The simNGS packages comes with two binaries:

  • simLibrary, which simulates library construction. It takes as input the reference genome and outputs fragments with size distribution specified by the command line parameters (or defaults).

  • simNGS, which simulates sequencing and basecalling.It takes the fragments as input and in paired-end mode generates two reads from the ends of the fragment.

If you give as input a genome as a fasta file to simNGS, than it is expected to give you one read pair per chromosome, as it will interpret the chromosomes as fragments.

What you need is something like this:

simLibrary -n [number_of_fragments] reference.fas | simNGS -o fastq -p paired [runfile] > simulated_reads.fq

I would recommend reading the full documentation of simLibrary and simNGS before using them just to make sure that you get the output you expect.

ADD COMMENT
1
Entering edit mode

Thanks a lot. I will give the documentation a closer look - I've just been very short on time.

ADD REPLY
0
Entering edit mode

Is there any means of ensuring that the read names output are unique? I'm running into problems downstream with non-unique read names...

ADD REPLY
0
Entering edit mode

I guess your problem is that the reads having identical end points have the same ID. Unfortunately with the current release there is no way to guarantee the uniqueness of the IDs. But it is easy to post-process the output to make them unique, as the reads from the same pair are printed out consecutively.

ADD REPLY
0
Entering edit mode

No problem - I can fix it manually!

ADD REPLY
1
Entering edit mode
13.5 years ago
Benm ▴ 710

You can try this one, https://sourceforge.net/projects/simulateseq/files/0.2.2/

ADD COMMENT
0
Entering edit mode
11.6 years ago
guillemch ▴ 140

Hi!

It has been a long time since this thread was opened, but I arrived here because I was facing the same problem. I was having problems due to non-unique read IDs when the organism from which I was generating the reads had more than one chromosome. I've made a patch to fix this (here), you can apply it on the last version of simNGS.

Hope it helps!

ADD COMMENT

Login before adding your answer.

Traffic: 2775 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6