Hello, I need assistance with plotting a phylogenetic tree for mitochondrial DNA. I have a VCF file with more than 10,000 samples, each with its own sub-haplogroup (in separate file). The variants and their genotypes were called using GATK best practices. I determined the haplogroups for all samples using HaploGrep software (command-line version). Now, I want to visualize these haplogroups in a phylogenetic tree. I have tried various methods in R without success. Here’s how I approached it: I used packages such as ape, factoextra, ggsci, and ggplot2 etc I created a data frame with sample IDs as row names and genotypes as columns.
Brief cmd used:
load data
vcf<-read.vcfR("~/ngs/20240224/20240319-MT/trees/chrMT.vcf.gz")
extract sample ID
sample_ids <- colnames(vcfgt)[-1]
Extract genotype data
genotype_data <- vcfgt genotype_df <- genotype_data[, -1]
My data frame looks like this.
dist_matrix <- dist(genotype_df)
Perform hierarchical clustering
hc <- hclust(dist_matrix)
Convert to dendrogram
dendrogram <- as.dendrogram(hc) hc <- hclust(distance_matrix, method = "average")
plot
fviz_dend(x=hc,cex = 0.4,lwd=0.09, h=7.5, k_colors = c("jco"), rect = TRUE, rect_border = "jco", type = "circular", rect_fill = TRUE).
Please see the attachment of the resulting plot.
The branches( tips) seem to be Individuals. But I wish to label the nodes and branches with haplogroups.
I would truly appreciate your assistance in successfully plotting this plot.
Thanks