How To Extract A Subset Of Reads In Fastq Using An Id List?
4
6
Entering edit mode
12.5 years ago
Luke ▴ 240

Hello! I obtained a list of unmapped reads IDs from my BAM file and I want to remap only the unmapped reads with other parameters. How can I extract the subset of unmapped reads from my original fastq file? Thank you in advance, Luke

fastq bam • 19k views
ADD COMMENT
0
Entering edit mode

I have a post here which addresses part of this question

ADD REPLY
11
Entering edit mode
9.5 years ago

I also wrote a program for this purpose, distributed with BBMap. Usage:

filterbyname.sh in=reads.fq out=filtered.fq names=names.txt include=t

The include flag will toggle between including or excluding the names in names.txt (which can, alternately, be another fastq or fasta file). This also supports paired input/output, and names being substrings or superstrings of read IDs.

ADD COMMENT
1
Entering edit mode

Thank you for this excellent tool which is rediculously fast when compared to scripts I've been using to achieve this goal.

ADD REPLY
3
Entering edit mode
12.5 years ago
Arun 2.4k

I prefer writing my own little snippets. However, it's possible using biopieces. This reply is from seqanswers (by maasha), pasted here for convenience.

First you need a file with the FASTQ sequence names you are interested in - or IDs if you like - one per line. And then:

read_fastq -i in.fastq | grab -E ids.txt | write_fastq -xo out.fastq

Check out grab for details.

ADD COMMENT
2
Entering edit mode
12.5 years ago

It is simpler to go back to the original .bam, and just pull out the .bam entries that are unmapped. samtools view -f4 should do it. Then, you can use something like Picard's SamToFastq to go back to fastq format, if you need to. (Some software, like velvet, is fine with using .bam as input)

ADD COMMENT
0
Entering edit mode
12.5 years ago
Luke ▴ 240

I've found a quick solution with cdbfasta and cdbyank tools.

First you have to index your fastq with cdbfasta, then you can search for the IDs in fastq with cdbyank. For more info http://sourceforge.net/projects/cdbfasta/

Thank you,
Luke

ADD COMMENT

Login before adding your answer.

Traffic: 1973 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6