Question

How to distinguish the haplotype network based on the sampling location?

2

Entering edit mode

2.9 years ago

amirandi1808 ▴ 20

I have 20 DNA sequences from 4 sampling location, 5 individual each location. I would like to make haplotype network using R studio. Based on the analysis, It was written that I have 8 haplotypes. However, I cannot give different color for each location, especially when there are sharing haplotype between locations.

Here is my script

library("ape")

library("pegas")

read.dna("FIKS YA.fasta", format="fasta") -> Naso

Naso

NasoHaps <- haplotype(Naso)

NasoHaps

NasoNet <- haploNet(NasoHaps)

pop <- rep(paste0("pop", 1:4), each = 5)

region <- rep(c("regA", "regB", "regC", "regD"), each = 5)

table(region, pop)

h <- haplotype(Naso)

h


d <- dist.dna(NasoHaps, "N")

nt <- rmst(d, quiet = TRUE)

nt

plot(nt)

This is the output

Could anyone help me how to distinguish each haplotype based on the sampling locations and give them different color based on sampling locations? please note: these instructions are ambiguos

pop <- rep(paste0("pop", 1:4), each = 5)

region <- rep(c("regA", "regB", "regC", "regD"), each = 5)

haplotype Ape network Pegas • 847 views

ADD COMMENT • link updated 2.9 years ago by 1311703846 • 0 • written 2.9 years ago by amirandi1808 ▴ 20

score 0 · Answer 1 · 2022-02-19

Hi amirandi, I write some R code using your variable names to answer your question. First, we need a hap.pie file which is used to set the location and colors for each pie.

> hap.pies <- with(  
> stack(setNames(attr(NasoHaps,'index'),1:length(attr(NasoHaps,'index')))),  
> table(hap=as.numeric(as.character(ind)),pop=samples[values,region]) )

here let me explain the codes, hap.pies is what we need to set the location or other information for haplotypes.

> attr(NasoHaps,index)

hap file generated by haplotype(Naso) contains the haplotype and related fasta sequences. The format is like: index1(haplotype 1) : sequence 1; sequence 2. sample file is a file where you put the region information for each fasta sequence( the order of sample file is the same as fasta file). it contains two columns: sequence id and region;