R, msa and sequence alignment progress?
1
0
Entering edit mode
2.8 years ago

Hi,

I am trying to do some msa using R.

enter image description here

When I try to use clustaW and see my alignment it doesn't show anything. Is that because the file is too big and clustaW is still running? Is there any way to see the progress of the alignment if it is the case?

I am doing the following:

library(msa)

mySequenceFile <- system.file ("sequences", "allSequences.fasta", package ="msa")

mySequence <- readAAStringSet(mySequenceFile)

mySequence <- readDNAStringSet(mySequenceFile)

mySequences

myAlignment <- msa(mySequence, "ClustalW") myAlignment

Thanks

msa clustaW R • 2.4k views
ADD COMMENT
2
Entering edit mode
2.8 years ago
Mensur Dlakic ★ 28k

The information is lacking here. It is not the same aligning 21 sequences that are 500 bp long and 21 sequences that are 30 kb long. I am only guessing, but you seem to be in the latter category. If so, most common alignment programs (such as clustalw) are not meant for aligning sequences of that length, especially if there are many of them.

If you want to get a visual progress update, I suggest you run the commands directly rather than through R interface. Both Clustal Omega (aka, clustalo) and Muscle will produce screen update, and are more recent and likely better aligners than clustalw.

ADD COMMENT
0
Entering edit mode

I left the PC running for about 20 minutes and it aligned everything. On the top it was showing R was running stuff and I didn't notice. I will try to align it with Muscle.

ADD REPLY
0
Entering edit mode

What should I use instead of clustaW for this type of sequences?

Thanks

ADD REPLY
0
Entering edit mode

You mean beside clustalo and muscle? Also, you didn't tell us what type of sequences you have.

ADD REPLY
0
Entering edit mode

It is a fasta file with multiple nucleotide https://www.ncbi.nlm.nih.gov/nuccore covid genomes (Humans, pangolins and bats)

File is here https://drive.google.com/file/d/1AsR5pSmgMOuBxAFrnE9OaGqVSuwLBqvp/view?usp=sharing

ADD REPLY
0
Entering edit mode

By type I actually meant size of sequences, so that may have been unclear. As I indicated above, these programs are not meant for aligning very long sequences, and definitely not for aligning large genomes.

Also, I gave you two choices (clustalo and muscle) other than ClustalW. Either one should do a good job and be faster than ClustalW. Are you asking for other options?

ADD REPLY
0
Entering edit mode

Yes, as once I finished aligning everything and plotting trees the differences are quite significant if I do it on MEGA x R. Same genomes for the same fasta file.

Fasta file aligned using https://www.ebi.ac.uk/Tools/msa/clustalo/ + MEGA phylogenetic tree

enter image description here

ADD REPLY
0
Entering edit mode

By using R:enter image description here

library(seqinr)

library(adegenet)

library(ape)

library(ggtree)

library(DECIPHER)

library(Biostrings)

library(viridis)

library(ggplot2)

library(msa)

mySequenceFile <- system.file ("sequences", "allSequences.fasta", package ="msa")

mySequence <- readDNAStringSet(mySequenceFile)

mySequence

myAlignment <- msa(mySequence, "ClustalOmega")

myAlignment

alignnment <- msaConvert(myAlignment, "seqinr::alignment")

distMatrix <- dist.alignment(alignnment, "similarity")

clustering <- hclust(distMatrix)

plot(clustering)

dendrogram <- as.dendrogram(clustering)

phylotree = as.phylo(clustering)

plot(phylotree, type="radial")

ADD REPLY
0
Entering edit mode

By using BLASTN with an extra sequence vs the all sequences file enter image description here

ADD REPLY
0
Entering edit mode

Nevermind, I actually was being a bit silly and plotting the wrong thing. Managed to get the same tree for the aligned sequences after playing around the trees.

enter image description here

ADD REPLY

Login before adding your answer.

Traffic: 1988 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6