Question

How do you classify the miRNAs? What is the best way to classify miRNAs hypothetical using Blastn-short or RFAM? [doubts]

2

Entering edit mode

10.4 years ago

margxenscienculo ▴ 50

How do you classify the miRNAs? What is the best way to classify miRNAs hypothetical using Blastn-short or RFAM? [doubts]

Hello everyone. Let's me explain my doubts. I mean about 200 sequences which was predicted by mir-Bag (pre-microRNA) before. Then I used this pre-micros with the miRdup in order to obtain the microRNAs. Now, I want to know which have been listed above. Firstly, I did a blastn -task blastn-short -evalue 10 (v. 2.2.30) against the mature miRNAs from mirBase.org. I wonder if you use this strategy and what options you usually use like:

-evalue 10
-penalty -4?
-word_size 4 or 7?
-ungapped?
-reward 1?

Because it is not neccesary to have a full coverage of the query, therefore there are some isomeRs with slightly mismatches. But if the seed region (2nt-7nt) is covered fully then it is the same family miRNA. Isn't it? Or I have to do the RFAM. Regards.

I attach one example and some options of blastn (v. 2.2.30.)

query= ENTRY327687_trypsinogen_354_553

Length=22
                                                                      Score     E
Sequences producing significant alignments:                          (Bits)  Value

  dme-miR-4941-5p MIMAT0020148 Drosophila melanogaster miR-4941-5p    24.3    0.21
  dvi-miR-9538-3p MIMAT0035619 Drosophila virilis miR-9538-3p         22.3    0.83
  mmu-miR-7677-5p MIMAT0029868 Mus musculus miR-7677-5p               22.3    0.83
  mmu-miR-6922-3p MIMAT0027745 Mus musculus miR-6922-3p               22.3    0.83
  gga-miR-6708-5p MIMAT0025818 Gallus gallus miR-6708-5p              22.3    0.83
  mtr-miR2590j MIMAT0021334 Medicago truncatula miR2590j              20.3    3.3  
  mtr-miR2590i MIMAT0021333 Medicago truncatula miR2590i              20.3    3.3  
  mtr-miR2590h MIMAT0021332 Medicago truncatula miR2590h              20.3    3.3  
  tca-miR-3813-5p MIMAT0018648 Tribolium castaneum miR-3813-5p        20.3    3.3  
  mtr-miR2598 MIMAT0013301 Medicago truncatula miR2598                20.3    3.3  
  oan-miR-181c-3p MIMAT0007059 Ornithorhynchus anatinus miR-181c-3p   20.3    3.3  


Query_29  2   TGGCGGTGAGCAGAATAATTG  22
20244     19   GGCGGTGAGCAG          8
34141     12            CAGAATAATTG  22
28818     4       GGTGAGCAGAA        14
26045     13      GGTGAGCAGAA        3
25597     11    GCGGTGAGCAG          21
21071     12           GCAGAATAAT    21
21070     12           GCAGAATAAT    21
21069     12           GCAGAATAAT    21
18383     10         GAGCAGAATA      1
13217     19           GCAGAATAAT    10
7532      10  TGGCGGTGAG             1

Lambda      K        H
    1.37    0.711     1.31

Gapped
Lambda      K        H
    1.37    0.711     1.31

Effective search space used: 4258254

miRNA RFAM blastn-short MicroRNA • 5.2k views

ADD COMMENT • link updated 3.1 years ago by Ram 45k • written 10.4 years ago by margxenscienculo ▴ 50

1

Entering edit mode

Hi,

I don't totally understand the goal. Do you have some sequences? a lot of sequences? Do you want to know which of them are miRNAs, or do you want to predict new miRNAs?

Do you want to know to which miRNA is most similar, or the family only?

There are a bunch of tools, but all are different and depends on the final goal. There are some tool you can use depending on your type of data. If you add this information, maybe I can tell u more.

cheers

ADD REPLY • link 10.4 years ago by Lorena Pantano ▴ 400

0

Entering edit mode

Hi. I mean about 200 sequences which was predicted by mir-Bag (pre-microRNA) before. Then I used this pre-micros with the miRdup in order to obtain the microRNAs. Now, I want to know which have been listed above.

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 10.4 years ago by margxenscienculo ▴ 50

Ram · Accepted Answer · 2014-12-02

3

Entering edit mode

10.4 years ago

Asaf 10k

I think BLAST is not the right tool, since it requires a word to be in exact match, you should try simple global alignment algorithm like needle of EMBOSS.

Rfam would be a good choice as well (or even better) but maybe you'll have a better success with the precursors, rather than the mature miRNAs since they contain more information.

Clustering of miRNAs is usually done using their seed sequences but this is not always true, some miRNAs can be in the same family (same seed) but have different targets since their 3' end changes specificity or vice-versa (different families - same targets).

ADD COMMENT • link updated 3.1 years ago by Ram 45k • written 10.4 years ago by Asaf 10k

0

Entering edit mode

For example with RFAM. Do you mean that I have to do a "cmsearch" and to take the coincidences with mirs? Like this. Or could I find the strict mir-families inside here?

See lines 3-5 below:

NTRY330561_PREDICTED:_uncharacterized_protein_LOC656533_347_546 -         GlmY_tke1            RF00128    cm        1       65       66      169      +    3'    3 0.41   0.0    5.7       3.7 ?   -
ENTRY297043_69_268                                               -         GlmY_tke1            RF00128    cm        1       65      110      175      +    3'    3 0.39   0.0    5.2       4.7 ?   -
ENTRY321165_247_446  -         mir-103              RF00129    cm        1       78        4       63      +    no    1 0.50   0.0   12.2      0.42 ?   -
ENTRY321165_246_445  -         mir-103              RF00129    cm        1       78      196      137      -    no    1 0.50   0.0   12.2      0.42 ?   -
ENTRY303864_84_283   -         mir-192              RF00130    cm       38      106      183      136      -    5'    2 0.38   0.0    7.8       8.1 ?   -
ENTRY331359_vitellogenin_receptor_422_621 -         mir-30               RF00131    cm       43       60       31       48      +    5'    2 0.50   0.0    8.7       5.6 ?   -
ENTRY285405_26_225   -         snoZ196              RF00134    cm        1       85      170       87      -    no    1 0.31   0.0   14.5         1 ?   -
ENTRY330459_---NA---_34_233 -         snoZ223              RF00135    cm        1       94      155      112      -    no    1 0.45   0.0   10.9       2.3 ?   -
ENTRY312250_169_368                                                                      -         SNORD81              RF00136    cm        1       77      138      105      -    no    1 0.35   0.0   13.1      0.81 ?   -
ENTRY331587_insulin-like_growth_factor-binding_protein_complex_acid_labile_chain_584_783 -         SNORD81              RF00136    cm        1       77       81       29      -    no    1 0.30   0.0   13.0      0.89 ?   -
ENTRY327687_trypsinogen_precursor_of_antryp7_354_553 -         SNORD83              RF00137    cm        1       78      180      145      -    no    1 0.31   0.0   14.4      0.22 ?   -
ENTRY331153_isoform_b_701_900            -         Alpha_RBS            RF00140    cm       62      110        1       45      +    5'    2 0.33   0.0    8.1      0.66 ?   -
ENTRY330501_protein_rtf2_homolog_157_356 -         Alpha_RBS            RF00140    cm        1       64      137      200      +    3'    3 0.39   0.0    6.4       1.8 ?   -
ENTRY285405_26_225   -         SNORD34              RF00147    cm        1       80      149       83      -    no    1 0.27   0.0   15.3      0.14 ?   -
ENTRY292611_92_291   -         SNORD34              RF00147    cm        1       80      108      178      +    no    1 0.31   0.0   14.1      0.28 ?   -
ENTRY316886_63_262   -         SNORD34              RF00147    cm        1       80       31      127      +    no    1 0.29   2.3   10.8       1.9 ?   -

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 10.4 years ago by margxenscienculo ▴ 50

0

Entering edit mode

Yes. I don't know how you can locate pre-miRs in the database though

ADD REPLY • link 10.4 years ago by Asaf 10k

0

Entering edit mode

Sorry, I edit the main message due to the misunderstandings. I mean about 200 sequences which was predicted by mir-Bag (pre-microRNA) before. Then I used this pre-micros with the miRdup in order to obtain the microRNAs. Now, I want to know which have been listed above. Anyway, Thanks. I will come back to read cmscan, cmsearch of Infernal (Rfam). Sometimes, I am not sure if that I am doing it is correct o not. That is, I need the "approbation" , "advises" of others.

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 10.4 years ago by margxenscienculo ▴ 50

Ram · Accepted Answer · 2014-12-04

My two cents.

If you want to obtain homologous, I tried once with BLAT modifying the parameter stepSize=2 and tileSize=8 to get all possible alignments. I would do two strategies:

Map my precursors to the miRBase precursors to obtain homologous precursors, maybe anything > 80%
Map my miRNAs to the miRBase precursors and check if they are inside the corresponding mature sequences allowing an error of 3 nt at the borders. That means, if the mature of that precursor start at the position 23, I will check any of my sequences starting from position 20 to 26. The same for the 3' end.

In both of your case you need to clean your results to get the most score hits, and check them to see if they make sense.