extract reads with given size range
1
0
Entering edit mode
8.4 years ago
apelin20 ▴ 480

Hello,

I did a smallRNA-Seq experiment where the sequencing provider was supposed to sequence small RNAs 15-25 nt. After trimming adapters, I see that there are reads ranging in size from 15 to 35 bp, but also reads that are 50 bp (the full length of the read). Since I sent total RNA, I assume the 50bp are RNA species other than miRNA/smallRNA. I was to extract all reads <40bp and >49bp in 2 separate files.

The only way I can figure to do this is to convert to fasta, determine the length of each read using a tool, use R to create my 2 lists of reads based on sizes, save the list and use seqtk to extract from fastq. This sounds long and silly.

Any alternatives?

Thanks

RNA-Seq smallRNA-Seq fastq • 4.9k views
ADD COMMENT
1
Entering edit mode

Instead you can directly extract reads which are of particular length.

and most importantly, a previous dicussion: Filtering Fastq Sequences Based On Lengths

ADD REPLY
1
Entering edit mode

You can also use the BBMap package like this:

reformat.sh in=reads.fq out=filtered.fq minlength=15 maxlength=25

But, you can just as easily do that at the same time as adapter trimming if you use BBDuk, which also supports those flags. Both Reformat and BBDuk are many times faster than prinseq or fastx.

ADD REPLY
1
Entering edit mode
8.4 years ago
iraun 6.2k

Have you tried awk solution to parse fastq files?

awk 'BEGIN {OFS = "\n"} {header = $0 ; getline seq ; getline qheader ; getline qseq ; if (length(seq) < 40) {print header, seq, qheader, qseq}}' input.fq > lessthan40bp.fq

awk 'BEGIN {OFS = "\n"} {header = $0 ; getline seq ; getline qheader ; getline qseq ; if (length(seq) > 49) {print header, seq, qheader, qseq}}' input.fq > morethan49bp.fq
ADD COMMENT

Login before adding your answer.

Traffic: 1980 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6