Cut fastq files
3
0
Entering edit mode
12 months ago
sansan96 ▴ 130

Hello, I have a couple of fastq files of approximately 20 million reads (transcriptomes) and I want to extract 5-10 thousand reads to test assemble them on my laptop, is it possible?

I just want to do an exercise with few reads. .

files fastq Cut • 1.0k views
ADD COMMENT
2
Entering edit mode
12 months ago
gunzip -c in.fastq.gz | paste - - - - | head -n 10000 | tr "\t" "\n" | gzip > sub.fastq.gz
ADD COMMENT
1
Entering edit mode

Why not just head -n 40000 on the unzipped fastq stream directly?

ADD REPLY
0
Entering edit mode

Thanks for you answer AT.

ADD REPLY
0
Entering edit mode

Thanks for your response, this command extracts the first 1000 reads from the fastq file right?

ADD REPLY
2
Entering edit mode
12 months ago
GenoMax 147k

Use reformat.sh from BBMap suite. Relevant options for sampling.

reformat.sh -Xmx4g in=file.fq.gz out=sampled.fq.gz RELEVANT_OPTIONS_BELOW

reads=-1                Set to a positive number to only process this many INPUT reads (or pairs), then quit.
skipreads=-1            Skip (discard) this many INPUT reads before processing the rest.
samplerate=1            Randomly output only this fraction of reads; 1 means sampling is disabled.
sampleseed=-1           Set to a positive number to use that prng seed for sampling (allowing deterministic sampling).
samplereadstarget=0     (srt) Exact number of OUTPUT reads (or pairs) desired.
samplebasestarget=0     (sbt) Exact number of OUTPUT bases desired.
ADD COMMENT
1
Entering edit mode
12 months ago
size_t ▴ 120

get randon reads from fastq file use seqtk: seqtk sample read.fq.gz 1000000 |gzip > sub_reads.fq.gz

ADD COMMENT

Login before adding your answer.

Traffic: 2388 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6