Hi everyone,
I have multiple samples that were sequenced and subsequently genotyped. I want to find potential mismatches by comparing the two experiments. I called snp's using VarScan and then used SNPSift concordance. I have some trouble understanding what the results mean. The concordance by variance were not hard to understand. But the concordance by sample is confusing.
Following is the results from summary.txt
Number of samples:
5 File genotype.vcf
5 File sequencing.vcf
5 Both files
Errors:
REF fields does not match 3
ALT field does not match 62
I transposed the concordance_by_summary.txt for easy viewing
sample samp1 samp2 samp3 samp4 samp5
MISSING_ENTRY_genotype.MISSING_ENTRY_sequencing 0 0 0 0 0
MISSING_ENTRY_genotype.MISSING_GT_sequencing 25186 21172 24599 23377 17533
MISSING_ENTRY_genotype.REF 0 0 0 0 0
MISSING_ENTRY_genotype.ALT_1 1950 3731 2160 2737 6039
MISSING_ENTRY_genotype.ALT_2 3224 5457 3601 4246 6788
MISSING_GT_genotype.MISSING_ENTRY_sequencing 5740110 5820568 5803803 5761099 5811924
MISSING_GT_genotype.MISSING_GT_sequencing 27360 26483 26457 24585 16087
MISSING_GT_genotype.REF 0 0 0 0 0
MISSING_GT_genotype.ALT_1 194 265 319 360 674
MISSING_GT_genotype.ALT_2 59 69 63 99 195
REF.MISSING_ENTRY_sequencing 0 0 0 0 0
REF.MISSING_GT_sequencing 0 0 0 0 0
REF.REF 0 0 0 0 0
REF.ALT_1 0 0 0 0 0
REF.ALT_2 0 0 0 0 0
ALT_1.MISSING_ENTRY_sequencing 4748629 4630439 4657942 4714946 4655508
ALT_1.MISSING_GT_sequencing 22123 19498 20188 17694 13286
ALT_1.REF 0 0 0 0 0
ALT_1.ALT_1 6004 7926 7401 10962 18038
ALT_1.ALT_2 811 1048 1008 1343 2711
ALT_2.MISSING_ENTRY_sequencing 3109366 3147098 3136360 3122060 3130673
ALT_2.MISSING_GT_sequencing 25032 23773 23969 19992 13804
ALT_2.REF 0 0 0 0 0
ALT_2.ALT_1 132 215 197 233 444
ALT_2.ALT_2 5371 7809 7484 11818 21847
ERROR 65 65 65 65 65
I understand the ERROR=65 is the num of REF fields and ALT fields (3+62) that dont match. But I don't understand which sample(s) it is from. Also, does SNPSift compare samples with each other ?