I am looking for a script to samle reads from paired fastq files. I understand that 'seqtk' may be what I'm looking for. How can I get it going on my computer? Windows or Ubuntu. Thank you. jd
I am looking for a script to samle reads from paired fastq files. I understand that 'seqtk' may be what I'm looking for. How can I get it going on my computer? Windows or Ubuntu. Thank you. jd
Just to cover the 'how to obtain the code' part. You will need to have git
installed need to clone the seqtk
url:
git clone https://github.com/lh3/seqtk.git
then switch to the seqtk directory and make it:
cd seqtk
make
Now you can invoke it as such:
~/mypath/to/seqtk/seqtk
You just want to make sure you use the same seed for both forward and reverse files and you will get the same reads. For example, to get 10,000 reads from files 1.fq and 2.fq:
$ seqtk sample -s11 1.fq 10000 > 1_10k.fq
$ seqtk sample -s11 2.fq 10000 > 2_10k.fq
I remember looking at the seqtk sample code in the past and if I recall correctly, the default seed is 11, so it may just work as intended without setting the seed. Though, I always use it as above to be certain. As far as compiling the code, just give it a try and let us know if you have problems (should be as simple as typing make
on Ubuntu).
Shameless plug: if you're interested in random sampling you could also use famas (latest dist 0.0.4), which fully supports paired-end reads, gzip in- and output etc.
To extract every 10th read (on average) from the gzipped paired-end input s1.fq.gz and s2.fq.gz you would run:
famas -i s1.fq.gz -j s2.fq.gz -o s1_10s.fq.gz -p s2_10s.fq.gz -s 10 --no-filter
Andreas
If you're using OSX, you can install seqtk
via the homebrew package manager. Once homebrew is installed, you can "tap" the science package. Then you enter the following into your terminal:
brew install seqtk
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
oh the neat stuff that I find reading this site - I am going to put it to good use since we need this greatly