Question

Comparing novel miRNAs from miRDeep2

1

Entering edit mode

6.0 years ago

glady ▴ 320

Hello,

How can I compare the novel miRNAs detected by miRDeep2 across different samples (based on their read counts)? For example, I have 3 samples(sample_1, sample_2,sample_3) from Condition-A. How can I check the read counts of say "novel_miR_1" from sample_1 with other 2 samples. I can't do this with the help of provisional id's, since they are different in every sample. Is there any other way to match these novel miR's across samples?

Thank you.

RNA-Seq • 2.5k views

ADD COMMENT • link updated 6.0 years ago by Emilio Marmol ▴ 180 • written 6.0 years ago by glady ▴ 320

score 1 · Answer 1 · 2019-01-28

1

Entering edit mode

6.0 years ago

Emilio Marmol ▴ 180

As far as I know, there is no easy way to do this. You could try to track where are these reads comming from in the genome alignment, and merge those clustered in the same region.

mirDeep2 software does miRNA prediction considering each group of technical repeats as a single batch. The prediction process is independent for each group. I personally find mirDeep2 output quite obscure and elusive if you want to identify novel miRNA candidates.

You could try a different perspective, instead of integrating everything as a whole:

(1) First, implement a novel prediction to get new candidates from your smallRNA-seq data. There are several softwares that do this. We recently developed a comprehensive pipeline that you cand check. https://github.com/emarmolsanchez/eMIRNA

(2) Once you have some novel candidates, introduce their coordinates to the miRNA GTF you use for quantification, and run some quantification software (i.e. Cutadapt, HT-seq, Stringtie). This will allow you quantifying known and novel miRNAs in your data.

(3) Input quantification matrices to some Differential Expression softwares such as DESeq2 or EdgeR.

ADD COMMENT • link 6.0 years ago by Emilio Marmol ▴ 180

0

Entering edit mode

Thank you very much for your reply. Yes, I agree with you the mirDeep2 output is quite obscure and elusive for novel miRNAs. Because the provisional id's contains a random number after the chromosome number, which is different in every sample.

What I did was to remove the miRNAs having the same precursor sequence, and selecting one from the miRNAs having an overlap of more that 50% among them. In this way I was able to remove the duplicates. And then I renamed all the novel_miR across samples by considering the miRNA sequence and their co-ordinates.

But, I would consider your point of trying out other software's that predict new candidates from smallRNA-seq data. I can try this and possibly compare it with the results of novel_miR's from mirdeep2 and pick the most likely candidates.

ADD REPLY • link 6.0 years ago by glady ▴ 320

0

Entering edit mode

Be carefull with removing quantification spots in your matrix. If same miRNAs have been quantified as mature or precursors or if some sequences quantified as different spots do really belong to the same transcript, the values in quantification matrix will be squeued and not representative of the real situation.

The proper thing to do is to quantify unique transcripts as a whole from the begining

ADD REPLY • link 6.0 years ago by Emilio Marmol ▴ 180