Question

Cut fastq files

0

Entering edit mode

12 months ago

sansan96 ▴ 130

Hello, I have a couple of fastq files of approximately 20 million reads (transcriptomes) and I want to extract 5-10 thousand reads to test assemble them on my laptop, is it possible?

I just want to do an exercise with few reads. .

files fastq Cut • 1.0k views

ADD COMMENT • link 12 months ago by sansan96 ▴ 130

1

Entering edit mode

review all your previous questions: add a comment or validate the answers ! Warning: Mate records missing HTSEQ ; Merge common elements in R ; Kallisto abundance.tsv ; Exception in thread "main" java.lang.RuntimeException: Sequence and quality length don't match ; Trinity Insilico Normalization

we already asked for it. Trinity Insilico Normalization

ADD REPLY • link 12 months ago by Pierre Lindenbaum 164k

2

Entering edit mode

12 months ago

GenoMax 147k

Use reformat.sh from BBMap suite. Relevant options for sampling.

reformat.sh -Xmx4g in=file.fq.gz out=sampled.fq.gz RELEVANT_OPTIONS_BELOW

reads=-1                Set to a positive number to only process this many INPUT reads (or pairs), then quit.
skipreads=-1            Skip (discard) this many INPUT reads before processing the rest.
samplerate=1            Randomly output only this fraction of reads; 1 means sampling is disabled.
sampleseed=-1           Set to a positive number to use that prng seed for sampling (allowing deterministic sampling).
samplereadstarget=0     (srt) Exact number of OUTPUT reads (or pairs) desired.
samplebasestarget=0     (sbt) Exact number of OUTPUT bases desired.

ADD COMMENT • link 12 months ago by GenoMax 147k

1

Entering edit mode

12 months ago

size_t ▴ 120

get randon reads from fastq file use seqtk: seqtk sample read.fq.gz 1000000 |gzip > sub_reads.fq.gz

ADD COMMENT • link 12 months ago by size_t ▴ 120

score 2 · Accepted Answer · 2023-11-16

2

Entering edit mode

12 months ago

Pierre Lindenbaum 164k

gunzip -c in.fastq.gz | paste - - - - | head -n 10000 | tr "\t" "\n" | gzip > sub.fastq.gz

ADD COMMENT • link 12 months ago by Pierre Lindenbaum 164k

1

Entering edit mode

Why not just head -n 40000 on the unzipped fastq stream directly?

ADD REPLY • link 12 months ago by ATpoint 85k

0

Entering edit mode

Thanks for you answer AT.

ADD REPLY • link 12 months ago by sansan96 ▴ 130

0

Entering edit mode

Thanks for your response, this command extracts the first 1000 reads from the fastq file right?

ADD REPLY • link 12 months ago by sansan96 ▴ 130