Question

Take A Subset Of A Fastq Paired-End Sample

0

Entering edit mode

12.3 years ago

dfernan ▴ 770

Hi,

I have two paired-end fastq compressed files coming from HiSeq RNA-SEq experiment, ie., pair.1.fastq.gz and pair.2.fastq.gz.

The files are very large so I wanted to just take a few million/thousand reads from each of them (by their respective pairs) and use that file for trying/debuuging purposes.

The results should be two paired-end files, i.e., pair.test.1.fastq.gz and pair.test.2.fastq.gz.

I'd be happy to hear some suggestions on how to do this or hear about tools available, thanks!

paired-end fastq rna-seq illumina • 15k views

ADD COMMENT • link updated 2.1 years ago by Ram 45k • written 12.3 years ago by dfernan ▴ 770

1

Entering edit mode

duplicate of

Selecting random pairs from fastq?

ADD REPLY • link 12.3 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

thanks Pierre, I didn't realize someone else asked about it!

ADD REPLY • link 12.3 years ago by dfernan ▴ 770

Ram · Answer 1 · 2013-03-19

3

Entering edit mode

12.3 years ago

Rahul Sharma ▴ 660

Hi,

Assuming that the reads are in same order in both of the files. I would do like this:

zcat pair.1.fastq.gz | sed -n 1,4000000p > pair_1_millions.fastq
zcat pair.2.fastq.gz | sed -n 1,4000000p > pair_2_millions.fastq

Thanks,
Rahul

ADD COMMENT • link updated 2.1 years ago by Ram 45k • written 12.3 years ago by Rahul Sharma ▴ 660

0

Entering edit mode

Hi, thanks a lot, however, I am not sure if the reads are in the same order, I'd like to add that I am pairing them correctly...

ADD REPLY • link updated 2.1 years ago by Ram 45k • written 12.3 years ago by dfernan ▴ 770

score 0 · Answer 2 · 2023-05-31

0

Entering edit mode

2.1 years ago

sebastian.gregoricchio ▴ 30

I believe that a better option for paired-end data is to use fastq-sample from fastq-tools:

fastq-sample -n 5000000 pair_R1.fastq.gz pair_R2.fastq.gz -o pair_5M_R

ADD COMMENT • link 2.1 years ago by sebastian.gregoricchio ▴ 30