Hi,
I am trying to do some msa using R.
When I try to use clustaW and see my alignment it doesn't show anything. Is that because the file is too big and clustaW is still running? Is there any way to see the progress of the alignment if it is the case?
I am doing the following:
library(msa)
mySequenceFile <- system.file ("sequences", "allSequences.fasta", package ="msa")
mySequence <- readAAStringSet(mySequenceFile)
mySequence <- readDNAStringSet(mySequenceFile)
mySequences
myAlignment <- msa(mySequence, "ClustalW") myAlignment
Thanks
I left the PC running for about 20 minutes and it aligned everything. On the top it was showing R was running stuff and I didn't notice. I will try to align it with Muscle.
What should I use instead of clustaW for this type of sequences?
Thanks
You mean beside clustalo and muscle? Also, you didn't tell us what type of sequences you have.
It is a fasta file with multiple nucleotide https://www.ncbi.nlm.nih.gov/nuccore covid genomes (Humans, pangolins and bats)
File is here https://drive.google.com/file/d/1AsR5pSmgMOuBxAFrnE9OaGqVSuwLBqvp/view?usp=sharing
By type I actually meant size of sequences, so that may have been unclear. As I indicated above, these programs are not meant for aligning very long sequences, and definitely not for aligning large genomes.
Also, I gave you two choices (clustalo and muscle) other than ClustalW. Either one should do a good job and be faster than ClustalW. Are you asking for other options?
Yes, as once I finished aligning everything and plotting trees the differences are quite significant if I do it on MEGA x R. Same genomes for the same fasta file.
Fasta file aligned using https://www.ebi.ac.uk/Tools/msa/clustalo/ + MEGA phylogenetic tree
By using R:
library(seqinr)
library(adegenet)
library(ape)
library(ggtree)
library(DECIPHER)
library(Biostrings)
library(viridis)
library(ggplot2)
library(msa)
mySequenceFile <- system.file ("sequences", "allSequences.fasta", package ="msa")
mySequence <- readDNAStringSet(mySequenceFile)
mySequence
myAlignment <- msa(mySequence, "ClustalOmega")
myAlignment
alignnment <- msaConvert(myAlignment, "seqinr::alignment")
distMatrix <- dist.alignment(alignnment, "similarity")
clustering <- hclust(distMatrix)
plot(clustering)
dendrogram <- as.dendrogram(clustering)
phylotree = as.phylo(clustering)
plot(phylotree, type="radial")
By using BLASTN with an extra sequence vs the all sequences file
Nevermind, I actually was being a bit silly and plotting the wrong thing. Managed to get the same tree for the aligned sequences after playing around the trees.