Question

Clustering Of Ncrna

1

Entering edit mode

11.6 years ago

cacaucenturion ▴ 250

I want to cluster a set of ncRNAs in human into different families based on sequence similarity. How should I do it and which software might be suitable to to it? Thanks very much!

• 2.4k views

ADD COMMENT • link updated 11.6 years ago by Neilfws 49k • written 11.6 years ago by cacaucenturion ▴ 250

score 2 · Answer 1 · 2013-04-01

2

Entering edit mode

11.6 years ago

Ryan Thompson ★ 3.6k

You might want to map your ncRNAs to the genome and then cluster based on overlap, using something like blockbuster.

ADD COMMENT • link 11.6 years ago by Ryan Thompson ★ 3.6k

0

Entering edit mode

Thanks! Actually what I meant was that I wanted to cluster RNA into families. Sorry for not saying clearly.

ADD REPLY • link 11.6 years ago by cacaucenturion ▴ 250

score 2 · Answer 2 · 2013-04-02

2

Entering edit mode

11.6 years ago

Paul Gardner ▴ 190

I'd build HMMs/CMs from alignments of the RNAs. Then iteratively add similar sequences to the largest clusters until you're done.

E.g. run hmmbuild -> hmmsearch -> hmmalign -> (predict a secondary structure w/ e.g. RNAalifold) -> make a stockholm alignment w/ secondary structure annotation -> cmbuild -> cmsearch -> cmalign. Repeat last 3 stages until convergence. Repeat with unaligned sequences.

ADD COMMENT • link 11.6 years ago by Paul Gardner ▴ 190

0

Entering edit mode

Thanks very much! So if I have a lot of RNA sequences and do not know which families they should belong to, how can I build HMM? Thanks!

ADD REPLY • link 11.5 years ago by cacaucenturion ▴ 250

0

Entering edit mode

Actually my question is that what sequences I should choose to start making alignments and building HMM? Can I use another kind of software like CDhit to get a clustering result initially and then use the largest cluster to build the HMM? Thanks!

ADD REPLY • link 11.5 years ago by cacaucenturion ▴ 250

Ram · Answer 3 · 2013-04-02

In general, the answer to "how do I cluster sequences?" is CD-HIT.

In this case, specifically CD-HIT-EST. From the applications page:

CD-HIT-EST has been used in clustering many types of sequences such as Expressed Sequence Tags (ESTs), MicroRNAs (miRNAs) (RNA, 2007 13:170-187), oligonucleotide probes (Bioinformatics, 2007 23:1195), 16S rRNA sequences (Nature, 2009, 457:480).