I am attempting to generate a set of 5 million test reads using the human genome as input. The program seems to only generate one read (or read pair) per sequence i.e. chromosome. I have tried to find some means of altering the parameters to get more reads but I have been unsuccessful.
simLibrary, which simulates library construction. It takes as input the reference genome and outputs fragments with size distribution specified by the command line parameters (or defaults).
simNGS, which simulates sequencing and basecalling.It takes the fragments as input and in paired-end mode generates two reads from the ends of the fragment.
If you give as input a genome as a fasta file to simNGS, than it is expected to give you one read pair per chromosome, as it will interpret the chromosomes as fragments.
I guess your problem is that the reads having identical end points have the same ID. Unfortunately with the current release there is no way to guarantee the uniqueness of the IDs. But it is easy to post-process the output to make them unique, as the reads from the same pair are printed out consecutively.
It has been a long time since this thread was opened, but I arrived here because I was facing the same problem. I was having problems due to non-unique read IDs when the organism from which I was generating the reads had more than one chromosome. I've made a patch to fix this (here), you can apply it on the last version of simNGS.
Thanks a lot. I will give the documentation a closer look - I've just been very short on time.
Is there any means of ensuring that the read names output are unique? I'm running into problems downstream with non-unique read names...
I guess your problem is that the reads having identical end points have the same ID. Unfortunately with the current release there is no way to guarantee the uniqueness of the IDs. But it is easy to post-process the output to make them unique, as the reads from the same pair are printed out consecutively.
No problem - I can fix it manually!