Dear All,
I have been trying to filter out reads from Fastq files from miRNA-Seq that we received. The read structure looks like the one shown in the figure below. I can use Cutadapt to filter out the adapter (we have the adapter sequence) and retain the 15 - 55 sequence using the -m and -M options. Before this filtering step, I want to filter out the common sequence (we know the sequence) and the UMI. I have tried the Seqkit grep option: seqkit grep -rvip ATCTGTAGGCAGGATCAAT s1.fq.gz -o s1.clean.fq.gz, but the cleaned output fastq file almost looks like the input fastq file. It seems I am missing something.
Are there any tools that I can use to remove the common sequence and the UMI before I proceed to trim reads with Cutadapt?
Many thanks
I think the problem with that command is that you will just remove the common sequence and keep smRNA+UMI+adapter.
I would just use the common sequence as the adapter sequence if you don't care about removing PCR duplicates through the UMIs. Or just add this to your command:
I get an error message when I tried the seqkit with * : zsh: no matches found: ATCTGTAGGCAGGATCAAT*
Hi, sorry I got mixed with seqkit.
seqkit grep
cannot do that, it can only give you the full sequence, not a subset. You could useseqkit locate
on a fasta file (go from fastq to fasta) to do this operation, but you would loose the information on the quality of the reads. I recommend just using the common sequence as an adapter:Sorry about the mixup!
Thanks a lot. I could try this option as well.
Which miRNA-seq library prep kit are you using? I'm a shill for miRge3.0 - thought it was super easy to get your data processed (if you have a well studied model organism) although hard to customize for more complex/downstream applications.
QIAseq miRNA Library Kit.
Good call, from personal experience that's the one that worked the best. I'd still check out the miRge3.0 pipeline, they have a one-liner that works near perfectly for the QIAseq kit.