Hi,
Here is a python function that converts a ped file to a nexus file as used by SNAPP. I am not sure about the specific requirements of SplitTree, but you can try.
It assumes that your ped is encoded with 12 instead of ACGT. You obtain such a ped from a standard ped with
plink --file <in_filename> --recode12 --out <out_filename>
then run within python:
ped_to_nexus(ped_fn, nexus_fn, n_individuals=<number of individuals>)
The function is attached below.
Best,
Hannes
def ped_to_nexus(ped_fn, nexus_fn, individual_names=None, n_individuals=None,
max_name_len = 12, comments=None,dominant=False):
"""
Write snps from ped file to nexus alignment file.
Attention: The bed file has to be encoded with:
0.. missing
1.. first allele
2.. second allele
Such a ped file can be obtained from a regular ped file
with "plink --file <in_filename> --recode12 --out <out_filename>"
This function assumes that individuals are diploid.
The two haploid columns in the ped become a single genotype (0/1/2)
in the nexus file.
-------Parameters---------
individual_names... list of individual (or taxa) names
if not supplied, the names from the bed file are used
n_individuals... number of individuals (ntax in the nexus format)
(if not supplied, the length of indiviudal_names is used)
max_name_len... maximal number of characters of individual names,
used to determine intent of alignments
comments... list of strings that are added to the nexus header as comments
alternatively, it can be a single string
dominant... if True, the 0 allele is assumed to be dominant and the resulting
nexus file is encoded 0/1 instead of 0/1/2
"""
#
assert individual_names is not None or n_individuals is not None,\
"specify either individual names or the total number of individuals"
if individual_names is not None and n_individuals is not None:
assert len(individual_names) == n_individuals,\
"number of individual names must equal argument n_individuals"
if n_individuals is None:
n_individuals = len(individual_names)
if individual_names is None:
individual_names = [None]*n_individuals
with open(ped_fn,"r") as bf, open(nexus_fn,"w") as nf:
nf.write("#NEXUS\n")
if comments is not None:
if not isinstance(comments, basestring):
for comment in comments:
nf.write("[{}]\n".format(comment))
else:
nf.write("[{}]\n".format(comments))
for i,(line,name) in enumerate(zip(bf,individual_names)):
if i >= n_individuals:
break
fields = line.split()
if name is None:
name = fields[0]
n_whitespace = max_name_len + 1 - len(name)
assert n_whitespace >= 1, "max taxon name length is " \
+ str(max_name_len) + " but name is " + name
data = fields[6:]
if i == 0:
if dominant:
symbols = "01"
else:
symbols = "012"
nf.write("\nBegin data;\n"
" Dimensions ntax={} nchar={};\n"
' Format datatype=integerdata symbols="{}" missing=? gap=-;\n'
" Matrix\n".format(n_individuals,len(data)/2,symbols))
nf.write(name)
nf.write(" " * n_whitespace)
for i in xrange(0,len(data),2):
h0 = int(data[i])
h1 = int(data[i+1])
if h0!=0 and h1!=0:
gt = h0 + h1 - 2
if dominant:
#assume 0 allele is dominant
gt = int(gt/2)
gt = str(gt)
else:
gt = "?"
nf.write(gt)
nf.write("\n")
nf.write(" ;\nEnd;")
Is there a way to do this with standard 0/1 binary? Your method works with --recode12, but SNAPP requires 1/0's.
I can convert to BED format to get a binary file, but I have no way to convert to nexus from there. Help!
Any follow-up on this? It's been a long time this question was posted and maybe you found a way to do it. I would be glad to find a solution to it!