Hierarchial Clustering
3
1
Entering edit mode
13.3 years ago
Sanju ▴ 90

Hi all

I have protein sequences dataset which are in fasta format. I have to find non redundant sequences from this data set. That is my aim. I have found the pairwise sequence similarity percentage and stored the result in excel sheet. My professor told me to use R programming for doing hierarchical clustering (single linkage method). I don’t want to use any software for this. I have to create a dendogram also. How can I do hierarchical clustering of protein sequences using R programming? Could you give R script for this?

I would like to get the R script for

             1) Reading excel file

             2) Hierarchial clustering (single linkage)

             3) Phylogenetic analysis                                      

             4) Creating dendogram.

Please help me.

r programming clustering tree • 6.1k views
ADD COMMENT
0
Entering edit mode

Is this not the same question you already asked once? Hierarchial Clustering

ADD REPLY
4
Entering edit mode
13.3 years ago
Philippe ★ 1.9k

Hi,

to answer more precisely to your question you can use the following functions:

  1. You can read any file with the read.table() function. If your input file is a csv file you can use an alias like read.csv() or read.csv2() whose default parameters might be those you need.

  2. Once your data is loaded into R you can cluster them using the hclust() function. It implements single linkage clustering (you can select it through the method argument).

  3. Phylogenetic analysis is a broad description. Depending on what you want to do the package ape might be helpful.

  4. To create a dendogram you can directly give the output of the hclust() function as an argument to the function plot. For example: h <- hclust(data); plot(h) # Will plot a dendogram

In general, if you need more information about one of the function you have to use you can read the associated help file using the help or ? command. Example for hclust(): ?hclust; help(hclust) # Two different ways to read the help files for the hclust function

ADD COMMENT
0
Entering edit mode

Thank you very much for your answer.

ADD REPLY
0
Entering edit mode

Dear friend,

Could you please provide a sample script for me? Because I am a beginner in programming. Please help me.

ADD REPLY
0
Entering edit mode

I'm sorry but I don't think that providing a full script will help you. The things you ask can be done at least at 90% with the functions I gave you. Looking at these functions, trying by yourself and looking at the help file as I mentioned will allow you to achieve your goal and improve your programming skills. If you are really stuck with one precise point then you can ask help for it. Also, if your profesor asked you to use R maybe he can help you with some special needs you might have.

ADD REPLY
0
Entering edit mode

Dear friend,

I am really sorry for the trouble. I will try.

ADD REPLY
1
Entering edit mode
13.3 years ago
Assa Yeroslaviz ★ 1.9k

Have a look here:

hierarchial-clustering

Aleksandr Levchuk has already wrote a script for blasting and constructing a hierarchical clustering with R. (Using the search option would have save you time.)

ADD COMMENT
0
Entering edit mode
13.3 years ago

MCL clustering is also a very popular solution (http://micans.org/mcl/), but works with graphs/networks instead of the more traditional dendrograms. See the manual here http://micans.org/mcl/man/mclcm.html.

The algorithm author (Stijn van Dongen) has also provided an R script here http://www.bigre.ulb.ac.be/Users/jvanheld/BMOL-F-501/practicals/r_scripts/mcl.R

Not sure if this is useful in this context, but something to consider for future work perhaps?

ADD COMMENT
0
Entering edit mode

Thank you very much

ADD REPLY

Login before adding your answer.

Traffic: 2075 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6