I have protein sequences dataset which are in fasta format. I have to find non redundant sequences from this data set. That is my aim. I have found the pairwise sequence similarity percentage and stored the result in excel sheet. My professor told me to use R programming for doing hierarchical clustering (single linkage method). I don’t want to use any software for this. I have to create a dendogram also. How can I do hierarchical clustering of protein sequences using R programming? Could you give R script for this?
to answer more precisely to your question you can use the following functions:
You can read any file with the read.table() function. If your input file is a csv file you can use an alias like read.csv() or read.csv2() whose default parameters might be those you need.
Once your data is loaded into R you can cluster them using the hclust() function. It implements single linkage clustering (you can select it through the method argument).
Phylogenetic analysis is a broad description. Depending on what you want to do the package ape might be helpful.
To create a dendogram you can directly give the output of the hclust() function as an argument to the function plot. For example:
h <- hclust(data);
plot(h) # Will plot a dendogram
In general, if you need more information about one of the function you have to use you can read the associated help file using the help or ? command. Example for hclust():
?hclust;
help(hclust) # Two different ways to read the help files for the hclust function
I'm sorry but I don't think that providing a full script will help you. The things you ask can be done at least at 90% with the functions I gave you. Looking at these functions, trying by yourself and looking at the help file as I mentioned will allow you to achieve your goal and improve your programming skills. If you are really stuck with one precise point then you can ask help for it. Also, if your profesor asked you to use R maybe he can help you with some special needs you might have.
Aleksandr Levchuk has already wrote a script for blasting and constructing a hierarchical clustering with R.
(Using the search option would have save you time.)
Is this not the same question you already asked once? Hierarchial Clustering