I have whole genome sequencing data (Illumina paired-end 150bp) for a diverse set of highly polyploid accessions with various levels of ploidy (2n=6-16x). I would like to know if there is a method to get sequencing depths for combination of alleles that are shared on the same read.
For example, for three SNPs with two alleles A and G within 100bp, a given individual could have have the following SNP genotyping :
- SNP1: depth_A=40, depth_G=60
- SNP2: depth_A=10, depth_G=90
- SNP3: depth_A=50, depth_G=50
I would like to obtain the following haplotype genotyping:
- Haplotype: depth_AAA=5, depth_AAG=2,...,depth_GGG=40
I hope this was clear, thanks for your help.
Thanks a million for this! I think I would have needed several months to code something comparable.
I have tried to run it on a test VCF dataset of my own with one individual and two SNPs:
With the proposed method, I obtain:
There seems to be an issue in that it aggregates results only for the first category "CG". I don't know to which extent this is complicated to solve?
Otherwise the results are very comprehensive, thanks again.
After some exchanges with Pierre, the program has been updated and the problem is now solved:
Thanks a lot to him!