Question

A Converter Of Sequence Data (Nexus, Phylip, Or Any Kind Of Sequence Data File) File To Haplotype Or 0/1 Infinite-Sites Data?

0

Entering edit mode

12.2 years ago

Sang Chul Choi • 0

Hi,

I wonder if there are tools of converting sequence data to 0/1 infinite-sites data. I could make a script to do this, but I'd made one and forgotten it later. Now, I need to make one again, so I wonder if there are tools that people tend to use.

Thank you for your answers.

sequence haplotype conversion • 4.9k views

ADD COMMENT • link updated 12.2 years ago by David W 4.9k • written 12.2 years ago by Sang Chul Choi • 0

score 1 · Answer 1 · 2013-05-23

I've used the R libraries pegas and ape to do this. Pegas provides the function haplotype to get the frequency of each unique seqeunce, which make it all straight forward

#example sequence data, use read.dna() to get sequences from file
> seq_data <- woodmouse[sample(1:15, 100, replace = TRUE), ]
> h <- haplotype(seq_data)

#turn the haplotype object into a 0/1 matrix
> tab <- sapply(attr(h, 'index'), function(i)
                  sapply(1:dim(seq_data)[1], function(j) sum(i==j)))
> head(tab[,1:5])
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    0    0    0    0
## [2,]    1    0    0    0    0
## [3,]    0    1    0    0    0
## [4,]    0    0    1    0    0
## [5,]    0    1    0    0    0
## [6,]    0    0    0    1    0

#rows are individuals, all should have one and only one haplotype
> all(rowSums(tab)==1)
##[1] TRUE

#label the rows with their sequence name
rownames(tab) <- labels(seq_data)

If you make this conversion a lot, it's easy to write R scripts that take command line arguments and the like