Question

Filter out miRNA from ncRNA dataset

1

Entering edit mode

15 months ago

binaryCode ▴ 10

Hello everyone,

I have a question regarding the filtering of a ncRNA dataset containing miRNA. I want to get rid of plant-derived miRNAs. My approach is to use Bowtie2:

Index: Based on "fused" miRNA fastas to receive one continuous sequence (instead of many small miRNA fasta - I did this step to improve the outcome of the alignment)
Sequences: ncRNA

My code:

bowtie2 -f -p 15 --very-sensitive-local -x ./index/miRNA -U ./data/ncRNA.fa -S ./alignment/ncRNA_miRNA.sam

with almost no reported alignments: 3106286 reads; of these: 3106286 (100.00%) were unpaired; of these: 3105977 (99.99%) aligned 0 times 308 (0.01%) aligned exactly 1 time 1 (0.00%) aligned >1 times 0.01% overall alignment rate

When I go the other way around and use the ncRNA dataset as the reference and align the miRNA data against it I receive following report: 3769 reads; of these: 3769 (100.00%) were unpaired; of these: 2789 (74.00%) aligned 0 times 148 (3.93%) aligned exactly 1 time 832 (22.07%) aligned >1 times 26.00% overall alignment rate

If I'm not wrong, I'm missing a lot of miRNAs in my ncRNA dataset with my alignment. I would appreciate some hints how to improve my alignment to get as many miRNAs as possible; also if I receive false positives.

Update: Initially, I used mature miRNA for this alignment. Based on the mention of one of the commentators, I tried it with hairpin sequences from mirbase and got a solid amount of filtered sequences. Due to the fact that I want to have my dataset as thoroughly purified of miRNA as possible, I am satisfied with it, even though I will have some false positive. Thanks for the people helping out.

bowtie bowtie2 miRNA filtering ncRNA • 872 views

ADD COMMENT • link updated 15 months ago by ATpoint 88k • written 15 months ago by binaryCode ▴ 10

2

Entering edit mode

Index: Based on "fused" miRNA fastas to receive one continuous sequence (instead of many small miRNA fasta - I did this step to improve the outcome of the alignment)

Do you have a reference for that. Never heard of this strategy.

How long are the reads?

ADD REPLY • link 15 months ago by ATpoint 88k

1

Entering edit mode

I'm guessing the OP is using miRNA hairpin references? OP, did you replace U's with T's in the RNA reference? Are you using a library prep that will capture mature miRNA?

ADD REPLY • link 15 months ago by Kevin ▴ 100

0

Entering edit mode

I'm using mature miRNA sequences (and didn't see the point to use the hairpin sequences - note that right now I'm preparing a ncRNA filter data set for my actual small RNA seq data to get rid of all ncRNA except miRNA). All U's were exchanged with T's. And could you please elaborate your last question?

ADD REPLY • link 15 months ago by binaryCode ▴ 10

0

Entering edit mode

What RNA-seq is that? Common RNA-seq means that you use RNA extraction kit that does not well capture short RNAs (like < 200bp) and common RNA-seq prep enriches also for RNAs with a certain size, not capturing short RNAs. Meaning, you need a special kit and prep for short RNAs.

ADD REPLY • link 15 months ago by ATpoint 88k

0

Entering edit mode

There is no reference I was following when I decided to fuse the miRNA data. It's just that I don't get any alignments when I align the ncRNA sequences against the normal miRNAs. My guess was that normally alignments are conducted against longer references and not short sequences and thus miRNA references would somehow conflict with the algorithm of bowtie2, leading to low score and no reported alignments.

I hope it is clear that I'm not talking about normal reads obtained from a RNA seq. I'm preparing a ncRNA filter data set for my actual small RNA-seq data. Filter data set = ncRNA (from various ncRNA data bases) - miRNA (plan derived from mirbase)

My stats are:

My reads/ncRNA: num_seqs (3,106,286), min_len (19), avg_len (376.5), max_len (4.361)
- My reference/miRNAs: num_seqs (3,769), min_len (17), avg_len (21.6), max_len (28)

ADD REPLY • link 15 months ago by binaryCode ▴ 10