Trimming of reads in miRNA-Seq data
1
1
Entering edit mode
16 months ago
Ezhil La ▴ 40

Dear All,

I have been trying to filter out reads from Fastq files from miRNA-Seq that we received. The read structure looks like the one shown in the figure below. I can use Cutadapt to filter out the adapter (we have the adapter sequence) and retain the 15 - 55 sequence using the -m and -M options. Before this filtering step, I want to filter out the common sequence (we know the sequence) and the UMI. I have tried the Seqkit grep option: seqkit grep -rvip ATCTGTAGGCAGGATCAAT s1.fq.gz -o s1.clean.fq.gz, but the cleaned output fastq file almost looks like the input fastq file. It seems I am missing something.

Are there any tools that I can use to remove the common sequence and the UMI before I proceed to trim reads with Cutadapt?

Many thanks

Read structure

miRNA-Seq Trimming • 2.2k views
ADD COMMENT
0
Entering edit mode

I think the problem with that command is that you will just remove the common sequence and keep smRNA+UMI+adapter.

I would just use the common sequence as the adapter sequence if you don't care about removing PCR duplicates through the UMIs. Or just add this to your command:

seqkit grep -rvip ATCTGTAGGCAGGATCAAT* s1.fq.gz -o s1.clean.fq.gz
ADD REPLY
0
Entering edit mode

I get an error message when I tried the seqkit with * : zsh: no matches found: ATCTGTAGGCAGGATCAAT*

ADD REPLY
0
Entering edit mode

Hi, sorry I got mixed with seqkit. seqkit grep cannot do that, it can only give you the full sequence, not a subset. You could use seqkit locate on a fasta file (go from fastq to fasta) to do this operation, but you would loose the information on the quality of the reads. I recommend just using the common sequence as an adapter:

cutadapt  -a ATCTGTAGGCAGGATCAAT -o s1.clean.fq.gz s1.fq.gz

Sorry about the mixup!

ADD REPLY
0
Entering edit mode

Thanks a lot. I could try this option as well.

ADD REPLY
0
Entering edit mode

Which miRNA-seq library prep kit are you using? I'm a shill for miRge3.0 - thought it was super easy to get your data processed (if you have a well studied model organism) although hard to customize for more complex/downstream applications.

ADD REPLY
0
Entering edit mode

QIAseq miRNA Library Kit.

ADD REPLY
0
Entering edit mode

Good call, from personal experience that's the one that worked the best. I'd still check out the miRge3.0 pipeline, they have a one-liner that works near perfectly for the QIAseq kit.

ADD REPLY
2
Entering edit mode
16 months ago
GenoMax 148k

to remove the common sequence and the UMI

You can use bbduk.sh from BBMap suite to do this. Try

bbduk.sh -Xmx2g in=your.fq.gz out=clean.fq.gz literal=ATCTGTAGGCAGGATCAAT ktrim=r k=7 

I will suggest that you stay with bbduk.sh and complete whatever you need to do.

ADD COMMENT
0
Entering edit mode

Thanks a lot. I tried with

bbduk.sh -Xmx27g in=s1.fastq.gz out=s1.bbduk.fastq.gz literal=ATCTGTAGGCAGGATCAAT ktrim=r k=7 minlen=15

It seems it removed 50.56% of reads from the Input (below is the output from the bbduk). Is this normal?

  • Input: 26325090 reads 1956237471 bases.
  • KTrimmed: 26221546 reads (99.61%) 1624914349 bases (83.06%)
  • Total Removed: 13309234 reads (50.56%) 1624914349 bases (83.06%)
  • Result: 13015856 reads (49.44%) 331323122 bases (16.94%)

Is there any parameter to filter out reads beyond the length of 55 (maximum length)?

Many thanks

ADD REPLY
1
Entering edit mode

Is there any parameter to filter out reads beyond the length of 55 (maximum length)?

You can add maxlength=55 to the command.

ADD REPLY
0
Entering edit mode

Thanks a lot.

ADD REPLY

Login before adding your answer.

Traffic: 1437 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6