Question

Length of miRNA reads after preprocessing steps

2

Entering edit mode

10.1 years ago

Maguelonne ▴ 20

Hi all,

miRNA analysis is new for me. I'm working on reads from a 55 cycle single-read sequencing run and I think I have a problem.. After pre processing steps (removing low quality reads, removing 5' and 3' adapters), 50% of the reads are 55 nucleotides long (that's mean around 1000000 on a total of 2000000).

I understand that these reads should be removed, as they can't be miRNA, but is it normal that length filtering implies removing such a number of reads? and to what could these reads correspond?

Thanks in advance for your help!

MR

miRNA seq RNA-Seq preprocessing length filtering • 6.8k views

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.1 years ago by Maguelonne ▴ 20

0

Entering edit mode

which adapter trimming did you use ?

ADD REPLY • link 10.1 years ago by Nicolas Rosewick 11k

0

Entering edit mode

cutadapt (and adapter sequences which appear as over-expressed sequences in the FastQC results disappear after adapter trimming meaning it went well, right?)

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.1 years ago by Maguelonne ▴ 20

Ram · Answer 1 · 2014-10-28

3

Entering edit mode

10.1 years ago

Manvendra Singh ★ 2.2k

I think the little similar question was posted here

where Ryan had given the reason,

I had suggested, in your case , I think it should be ...

##### remove the adapter
cutadapt --discard-untrimmed --minimum-length=20 --maximum-length=30 -a <adapter_sequence> In_seq.fastq > your_trimmed_file.fastq
####### download ribosomal and tRNA sequence and build its index
###### then remove also the reads mapping to ribosomal and tRNA sequences
bowtie --seedlen=23 --un output_file.fastq /path_to/bowtieindex/r_tRNA your_trimmed_file.fastq > /dev/null

your output_file.fastq should look better to allign.

HTH

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.1 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

Hi Manvendra,

Why should we consider upto 30bp as the maximum length when the maximum length of a mature miRNA is about 24bp?

Thanks,
Robert

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 6.4 years ago by robertmukiibi2012 • 0

Ram · Answer 2 · 2014-10-27

0

Entering edit mode

10.1 years ago

seta ★ 1.9k

Hi, if you did miRNA sequencing, you have to have about sequences with 20-23 bp in length after trimming, but you may have sequences with up to 35bp if you had small RNA-sequencing. those reads with unusual length (55bp) in your work can result from adaptor dimerization and have to remove, so during trimming, you can define a threshold that keep just sequences with 15-40 bp in length to get rid of unwanted sequences.

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.1 years ago by seta ★ 1.9k

0

Entering edit mode

Dear Seta

I have some human non coding RNA-seq data.for getting diff-exp of miRNA,should I trim length between 18 to 30 befor starting? why Avg. Sequence length is 51,can I use these data for getting diff-exp of lncRNA too?

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 8.3 years ago by Edalat ▴ 30

Ram · Answer 3 · 2014-10-28

0

Entering edit mode

10.1 years ago

Maguelonne ▴ 20

I actually just found an answer: these 50% of reads correspond to phiX contamination! I didn't suspect that because I thought it was "rare" after demultiplexing, but it seems that it's a well known problem.

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.1 years ago by Maguelonne ▴ 20