My current command is the following: in=./ERR1701760.fastq out=stdout.fastq overwrite=t samplereadstarget=10000 sampleseed=0 > x.fq
How can I increase the speed of this program for sampling reads? The reason I'm using
is because sometimes I have interleaved reads and I'll need the `R1.
Are there any parameters I can adjust? I know I can just take the first 10000 reads which will be much faster but I want to be able to use different random seeds here.
There are 109,100,547
(single ended?) reads in ERR1701760.fastq
. I thought originally that these were paired end since these are HISEQ but I feel like I only downloaded the forward reads.
(base) -bash-4.1$ head ERR1701760.fastq
@ERR1701760.1 1 length=143
+ERR1701760.1 1 length=143
@ERR1701760.2 2 length=151
+ERR1701760.2 2 length=151
@ERR1701760.3 3 length=156
I don't think you can do anything in terms of program options to speed things up.
Why is speed an issue BTW? Most BBMap suite programs are plenty fast even when single threaded (like
is).These are indeed paired-end reads. Looks like you only got the forward read.
Ok, that's what I thought too regarding the reads. I figured out one way to make it quicker while still being able to randomly sample the reads. I basically just did
so I can subsample from a smaller read set but still be able to use random seeds.