Hello,
I've been trying to solve this problem for a month now, so I thought it'd be time to ask for some help.
I've got a dataset that looks like this (anonymized with x):
ID ID-87xxxxx ID-88xxxxx ID-87xxxxx ID-96xxxxx
IndividualA 2 1 2 0
IndividualB 1 1 1 1
IndividualC 0 2 2 0
IndividualD 0 0 0 1
IndividualE 1 1 1 2
IndividualF 1 1 2 1
IndividualG 2 0 1 0
IndividualH 1 1 0 1
The 0,1 and 2 depict zogysity. The columns represents a marker. For any marker an individual's genotype is codified as the count of the copies of the second allele, meaning:
0: homozygote for the first allele
1: heterozygote
2: homozygote for the second allele
5: Unknown
I have 55k+ SNPs, and several thousand individuals (with their own unique 14 character long code).
My questions:
- What is the name of this type of data? (Is it allele count?)
- How do I convert this kind of data into something else? I am going to use NeEstimator, Structure and other software, and none of them accepts this format. It would be great to convert it to a data type I can use to further convert it to what I need (I know GENEPOP does this well)
- Is there any program that makes use of this format?
Thank you for reading, and for any help you may provide. I have tried looking for answers to these questions for a long time now.
see Roslin Bioinformatics - Law's Laws : http://bioinformatics.roslin.ac.uk/lawslaws/
Haha, I see I'm not the only one who's had to struggle with this. Thanks for the laugh though.