Hello,
I did a smallRNA-Seq experiment where the sequencing provider was supposed to sequence small RNAs 15-25 nt. After trimming adapters, I see that there are reads ranging in size from 15 to 35 bp, but also reads that are 50 bp (the full length of the read). Since I sent total RNA, I assume the 50bp are RNA species other than miRNA/smallRNA. I was to extract all reads <40bp and >49bp in 2 separate files.
The only way I can figure to do this is to convert to fasta, determine the length of each read using a tool, use R to create my 2 lists of reads based on sizes, save the list and use seqtk to extract from fastq. This sounds long and silly.
Any alternatives?
Thanks
Instead you can directly extract reads which are of particular length.
and most importantly, a previous dicussion: Filtering Fastq Sequences Based On Lengths
You can also use the BBMap package like this:
But, you can just as easily do that at the same time as adapter trimming if you use BBDuk, which also supports those flags. Both Reformat and BBDuk are many times faster than prinseq or fastx.