So I have this script that reads a fasta file containing sequences from species belonging to a single genus, then aligns them, creates a distance matrix and then a NJ tree using the "nj" function from the "ape" package.
library(msa)
library(ape)
library("Biostrings")
fastaFile = readDNAStringSet("genus.fasta")
seq_name = names(fastaFile)
sequence = paste(fastaFile)
genus_df <- data.frame(seq_name, sequence)
mySequences <- readDNAStringSet("genus.fasta")
alignment1 <- msa(mySequences,type="dna",method="ClustalW")
converted<- msaConvert(alignment1,type="ape::DNAbin")
class(converted) <- "matrix"
converted <- as.DNAbin(converted)
matrix_dist<-dist.dna(converted, model = "K80",as.matrix = T)
nj<-nj(matrix_dist)
plot(nj)
I am trying to check, for each species present in the NJ tree, which species are monophyletic in the tree and which aren't. For that I am using the "is.monophyletic" function also found in "ape":
species<-unique(genus_df$seq_name)
for (i in species){
print(paste(is.monophyletic(nj, tips=i),i))
The code seems to be working, but the output I am getting is not congruent with my interpretation of the plotted tree.
Output:
[1] "TRUE Trachinotus_blochii"
[1] "TRUE Trachinotus_baillonii"
[1] "TRUE Trachinotus_mookalee"
[1] "TRUE Trachinotus_ovatus"
[1] "TRUE Trachinotus_carolinus"
[1] "TRUE Raja_binoculata"
I am not sure if the problem is with my interpretation or the code. Since the tree is unrooted, I also tried to use the "root" function from "ape", using the Raja sequence as outgroup, but I still got the same output (TRUE for all species being monophyletic).
Thank you in advance for any suggestion!