Table With Snornabase Information?
2
1
Entering edit mode
11.9 years ago
Leandro Lima ▴ 970

Hello!

I have a list of snoRNABase ids (https://www-snorna.biotoul.fr/browse.php?sno=CDBox), and I need to get the information available in the pages like this:

https://www-snorna.biotoul.fr/plus.php?id=SNORD125

I'm writing a program to get it, but... I was wondering if someone here has already done it.

annotation • 1.6k views
ADD COMMENT
1
Entering edit mode

thanks for following up and posting the solution

ADD REPLY
2
Entering edit mode
11.9 years ago
Leandro Lima ▴ 970

The code, in Python.

# download_snoRNABase_info.py
# Created in: Jan 16, 2013
# Last modified in: Jan 16, 2013
# Leandro Lima <llima@cipe.accamargo.org.br>

from lxml.html import parse, document_fromstring
from twill.commands import *
import twill
from StringIO import StringIO
twill.set_output(StringIO())

output = open('snoRNA_info.csv', 'w')
output.write('name\tsno_id\tdescription\tbiotype\n')
link = 'snoRNABase_CDBox.html'
page = parse(link).getroot()
tr = page.find_class('traitbleu')
for td in tr[0].getchildren():
    # Reading ids
    for a in td.cssselect('a'):
        name = a.text
        link = 'https://www-snorna.biotoul.fr/plus.php?id=' + name
        x = go(link)
        text = show()
        page2 = document_fromstring(text)
        table = page2.find_class('tablecadre')[1]
        tr2 = table.getchildren()[2]
        sno_id = tr2[0].getchildren()[0].text
        description = tr2[1].text
        print name, sno_id, description
        output.write('%s\t%s\t%s\t%s\n' % (name, sno_id, description, 'snoRNA'))
ADD COMMENT
1
Entering edit mode
11.9 years ago
Leandro Lima ▴ 970

Solved.

=)

The result, in case you need:

www.vision.ime.usp.br/~llima/snoRNA_info.csv

ADD COMMENT

Login before adding your answer.

Traffic: 1727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6