I would like to conduct an allelic association test using my Illumina generated NGS data. Briefly, I conducted targeted resequencing on a 1MB region of 100 horses. This was done by pooling DNA of 4-5 horses such that there were 24 indexed groups of 4-5 horses each. Variant calling, etc, was done using an assumed ploidy of 8 or 10 as appropriate.
My VCF files obviously appear as though I have polyploid organisms, which Plink v1.9 and vcftools do not support. However, the data is just formatted as polyploid, when in reality the allele frequencies are from diploid organisms - so I figure there should be someway to manipulate the data to reflect that.
Ultimately, I'd like to determine the allele frequencies for each SNP for both my cases and controls, and get it into a format that plink would be happy with, but I'm not sure how. Is there a tool, or an efficient pipeline that anyone could suggest? My searches have not turned up much...
Why wouldn't you de-multiplex to get 100 different files, one for each horse, and map them as normal diploids?
Either my original post wasn't clear, or I'm missing something. DNA from 4-5 horses were pooled, and then all of the DNA in that pool was indexed with a single barcode, for a total of 24 indexed groups each containing several animals' DNA. I have no way of separating out an individual horse's reads (that I know of, at least!).