Entering edit mode
3.3 years ago
ManuelDB
▴
110
In order to see if one of mine application works as expected, I need VCF files (at least 2) of the same person covering the same region of the genome. I think that around 40 samples (20 people) would be enough.
Reason: I have developed an application that says if two VCF files belong to the same patient by comparing the variants of both files. I need the sample to check the percentage of similarity and then compare these results with the results obtained from VCF files of different people.
Note that to distinguish VCFs coming from same or different people you don't need to compare them entirely, but a few carefully chosen variants according to their population allele frequency. The more variants you chose with MAF around 0.5, the higher the resolution of your method will be, and just with a few tens of variants you'd have more than enough statistical power of discrimination.
Thanks, Jorge Amigo for your ideas. Do you know what is the best way to filter alleles with a MAF around 0.5? I have seen how some of the VCF files I have been using contain the field VF but not in all of them. I am new in this field and I have still much to learn...