I work with a non-model organism which is absent from online genome browsers. I have genotypes at 5000 loci for 100 haploid individuals. This is currently in the form of a table in R. Something like this:
ind1 ind2 ind3 [...] ind100
locus 1 A G A M (M = genotype unknown)
locus 2 C C M T
locus 3 G M C G
[...]
locus 5000 - T - T
I performed a crude GWAS-type analysis (my individuals are split into 2 groups; such analysis is easy with haploids). Now I want to visualize genotypes in a genome browser (Savant or IGV or something). The easiest seeems to be if I output so that I get one "track" with all individuals from 1 group; and a second "track" with individuals from the other group.
Which output format is my best bet? Googling seems to suggest that VCF is most appropriate. But I remain very confused by the format specifications: can I make one VCF per group? Will I then be able to see for each group, how many individuals have which genotype? Or would another format be better?
How would a line from such a file look like?
Thanks in advance, Yannick
Mary, Nice one. Thanks for the tip !