Question

extracting reads from fastq file based on read_id

0

Entering edit mode

8.4 years ago

sumithrasank75 ▴ 140

I have a file with a set of read ids called tmp. I want to extract reads with these read ids from another fastq file called reads.fastq. When I use grep -f tmp reads.fastq I get only the first line (header) of the read. How can I get the output in full fastq format.

next-gen • 7.8k views

ADD COMMENT • link updated 8.4 years ago by Charles Plessy ★ 2.9k • written 8.4 years ago by sumithrasank75 ▴ 140

score 4 · Answer 1 · 2016-07-15

4

Entering edit mode

8.4 years ago

GenoMax 147k

grep -A 3 seq_ID your file should get you the full fastq record. Iterate over the ID's you have.

filterbyname.sh from BBMap would also work with your list of ID's in a file.

ADD COMMENT • link 8.4 years ago by GenoMax 147k

0

Entering edit mode

Today I learnt about -A. Thanks.

ADD REPLY • link 8.4 years ago by poisonAlien ★ 3.2k

score 1 · Answer 2 · 2016-07-15

1

Entering edit mode

8.4 years ago

Charles Plessy ★ 2.9k

If your tmp and reads.fastq have their reads in the same order, you can try our syncpairs tools.

ADD COMMENT • link 8.4 years ago by Charles Plessy ★ 2.9k

score 0 · Answer 3 · 2016-07-15

0

Entering edit mode

8.4 years ago

brent_wilson ▴ 140

For bash, I would use a nice one-liner:

for x in $(cat tmp); do reads.fastq | grep $x -A 3; done

Of course you can redirect to wherever you want:

for x in $(cat tmp); do reads.fastq | grep $x -A 3; done > tmp2

I hope that this helps!

Brent Wilson, PhD | Project Scientist | Cofactor Genomics

4044 Clayton Ave. | St. Louis, MO 63110 | tel. 314.531.4647

Catch the latest from Cofactor on our blog.

ADD COMMENT • link 8.4 years ago by brent_wilson ▴ 140

2

Entering edit mode

For m desired reads out of a fastq file with n total reads, this takes O(mn) time, as opposed to the -f flag given to grep (or just using BBMap), which is likely O(n).

ADD REPLY • link 8.4 years ago by Devon Ryan 104k