Parsing Rna Secondary Structure Annotations?
6
11
Entering edit mode
14.1 years ago
Michael ▴ 110

Hello,

Is there a program that will label whether a nucleotide is a hairpin, bulge, etc. in an RNA secondary structure, or is there an option for mfold or RNAFold that does this? I'm looking for ASCII output kind of like this, but anything that's relatively easy to parse would be fine:

U start-of-bulge
A in-bulge
A end-of-bulge
:

Basically, a plain-text secondary-structure is what I would like to find.

Thanks!

EDIT: mfold output as requested by Pierre:

sequence used:

UGGAAGAAGCUCUGGCAGCUUUUUAAGCGUUUAUAUAAGAGUUAUAUAUAUGCGCGUUCCA

predicted structure:

mfold output

thermodynamic details:

Structural element      δG      Information
External loop         -1.70   2 ss bases & 1 closing helices.
Stack                 -3.30   External closing pair is G2-C60
Stack                 -2.40   External closing pair is G3-C59
Helix                 -5.70   3 base pairs.
Multi-loop             2.60   External closing pair is A4-U58
                              7 ss bases & 3 closing helices.
Stack                 -3.40   External closing pair is G27-C55
Stack                 -2.40   External closing pair is C28-G54
Stack                 -2.50   External closing pair is G29-C53
Helix                 -8.30   4 base pairs.
Interior loop          1.70   External closing pair is U30-G52
Stack                 -1.30   External closing pair is U32-A50
Stack                 -1.10   External closing pair is A33-U49
Stack                 -1.30   External closing pair is U34-A48
Stack                 -1.10   External closing pair is A35-U47
Stack                 -1.30   External closing pair is U36-A46
Helix                 -6.10   6 base pairs.
Hairpin loop           5.60   Closing pair is A37-U45
Stack                 -1.30   External closing pair is G6-U22
Stack                 -0.90   External closing pair is A7-U21
Stack                 -2.10   External closing pair is A8-U20
Stack                 -3.40   External closing pair is G9-C19
Stack                 -2.10   External closing pair is C10-G18
Helix                 -9.80   6 base pairs.
Hairpin loop           5.50   Closing pair is U11-A17
rna parsing • 13k views
ADD COMMENT
2
Entering edit mode

can you please post a sample of mfold output ?

ADD REPLY
0
Entering edit mode

Why not use regular expression scanning? For example, if you want pure hairpin structure, you just need a pure "()"order like (.(((....)).)). If there are ")...(" pattern, it means bulge.

ADD REPLY
0
Entering edit mode

@Pierre Lindenbaum: I've updated the post with the secondary structure and thermodynamic details. There are a number of different formats that mfold can produce though, like ct, Vienna, and a number of others that I don't really know what to do with -- I've only used mfold for the graphical output of the secondary structure.

ADD REPLY
0
Entering edit mode

@Ning-yi Shao: I hadn't thought of that because I'm not really familiar with Vienna notation. Is there any existing work on doing something like this?

ADD REPLY
5
Entering edit mode
11.7 years ago
juniper- ▴ 60

This reply is a little late to the party, but I recently created a script that does exactly this. You can input a structure using dot-bracket notation (i.e. ...(((...))).. and get an annotation that would look like this:

[me@computer forgi]$ echo "...(((..))).." | python examples/dotbracket_to_bulge_graph.py -
length 13
define f1 0 4
define h0 6 9
define s0 4 6 9 11
define t1 11 14
connect s0 f1 h0 t1

Where lines preceded by 'define' indicate the information you're interested in. For example, the line 'f1' indicates that nucleotides 1-3 are in the 5-prime unpaired region of the structure, 'h0' is a hairpin which includes nucleotides 7 and 8. The stem is 's0' which includes nucleotides 4-6 and 9-11. Finally the 3'-unpaired section is in nucleotides 12-13. The idiosyncrasy mentioned above is that in this annotation, everything except stems extends one nucleotide beyond what one would expect its boundaries to be.

For even greater simplicity, you can just get an output that matches the dotbracket notation:

[me@computer forgi]$ python examples/dotbracket_to_element_string.py -s examples/input/1y26_ss.dotbracket
(((((((((...((((((.........))))))........((((((.......))))))..)))))))))
sssssssssmmmsssssshhhhhhhhhssssssmmmmmmmmsssssshhhhhhhssssssmmsssssssss

Again, where s indicates a stem pair, m is a multiloop, h is a hairpin, i is an interior loop, etc...

There's a tutorial and documentation available here.

ADD COMMENT
1
Entering edit mode

Thanks for this tool -- it's working really well and I never would have found it if it weren't for this post.

ADD REPLY
0
Entering edit mode

Does this software still work? looks like threedee section was removed and cannot install now.

ADD REPLY
0
Entering edit mode

It does! For the examples given here, it should work fine. The threedee stuff was just moved to the development branch until it becomes more stable.

ADD REPLY
0
Entering edit mode

I have tried. Without threedee stuff, the software does not work.

ADD REPLY
0
Entering edit mode

Well that's unfortunate. If you're still interested in using it and can't get it to work, please send me an email (at the bottom of the project web page) with the error you get and I'll try and fix it ASAP.

ADD REPLY
3
Entering edit mode
13.9 years ago
Sequencegeek ▴ 740

Hi there!

So actually my peer and I wrote a quick script in python to do just that and I've included it below. I'll warn you, though, it is not the cleanest script...

To use: (Assuming the script is named annotateFold.py)

python annotateFold.py '(((((...))))))'

The output is the annotation for each nt starting at 0.

These are the annotation definitions:

  • stemLoop: nt is part of a stem-loop
  • bulge: nt is on a bulge
  • loop: nt is in loop of stem-loop
  • stem: nt is on the stem part of stem-loop (NOT in bulge or loop)
  • ss: nt is in single stranded region (Whereas a bulge has nt across from it)
  • ds: nt is in a double stranded region, but not immediately next to a stem-loop

Note: each nucleotide can have more than 1 annotation. For e.g., a bulge in a stem loop will have both 'stemLoop' and 'bulge'.

If you're knowledgeable with python it should be too hard to edit it and get exactly what you want...

Feel free to ask for any clarifications and GOOD LUCK!


ADD COMMENT
0
Entering edit mode

Hi, I'm wondering the content of function 'updateminorspace'

ADD REPLY
2
Entering edit mode
14.1 years ago
Casbon ★ 3.3k

Vienna

Input string (upper or lower case); @ to quit
....,....1....,....2....,....3....,....4....,....5....,....6....,....7....,....8
CCCCCCCCCCCGGGGGGGGG
length = 20
CCCCCCCCCCCGGGGGGGGG
((((((((....))))))))
 minimum free energy = -20.40 kcal/mol
ADD COMMENT
1
Entering edit mode
14.0 years ago

Evenn UNAfold you can use. below is the link

http://www.bioinfo.rpi.edu/applications/hybrid/download.php

and RNAStructure is also a user friendly interface. you can try out these both. below is the link.

http://rna.urmc.rochester.edu/RNAstructure.html

ADD COMMENT
1
Entering edit mode
14.0 years ago

Hi Michael,

The answer is no. I've used quite extensively most structural RNA programs (Vienna, RNAstructure, Kinefold, mfold, COVE, PKNOTS and others) and no one will output this kind of information. But, you can use the Vienna rnalib to easily parse RNAfold output to give any information you like. You just need to devise a proper classification scheme. rnalib is written in C and is nicely documented.

Anyway, there's no ready solution to this. Maybe a script associated with a paper . . .

ADD COMMENT
1
Entering edit mode
13.8 years ago
Cjt ▴ 370

You might also have a look at RNAeval of the Vienna package. The given result looks like:

 External loop                           :  -110
 Interior loop (  1, 73) GC; (  2, 72)
 GC:  -330
 Interior loop (  2, 72) GC; (  3, 71)
 GC:  -330
 Interior loop (  3, 71) GC; (  4, 70)
 CG:  -340
 Interior loop (  4, 70) CG; (  5, 69)
 UA:  -210
 Interior loop (  5, 69) UA; (  6, 68)
 AU:  -130
 Interior loop (  6, 68) AU; (  7, 67)
 UA:  -110
 Interior loop ( 10, 26) GC; ( 11, 25)
 CG:  -340
 Interior loop ( 11, 25) CG; ( 12, 24)
 UA:  -210
 Interior loop ( 12, 24) UA; ( 13, 23)
 CG:  -240
 Hairpin  loop ( 13, 23) CG            
 :   490
 Interior loop ( 28, 44) CG; ( 29, 43)
 AU:  -210
 Interior loop ( 29, 43) AU; ( 30, 42)
 CG:  -220
 Interior loop ( 30, 42) CG; ( 31, 41)
 CG:  -330
 Interior loop ( 31, 41) CG; ( 32, 40)
 CG:  -330
 Hairpin  loop ( 32, 40) CG            
 :   500
 Interior loop ( 50, 66) GC; ( 51, 65)
 CG:  -340
 Interior loop ( 51, 65) CG; ( 52, 64)
 UA:  -210
 Interior loop ( 52, 64) UA; ( 53, 63)
 GC:  -210
 Interior loop ( 53, 63) GC; ( 54, 62)
 AU:  -240
 Hairpin  loop ( 54, 62) AU            
 :   480
 Multi    loop (  7, 67) UA            
 :   170
ADD COMMENT

Login before adding your answer.

Traffic: 1467 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6