I have a list of SNPs and indels from 4 different software.
IndelGenotyper (broad tool)—for calling somatic indels Bambino—for calling SNP/Indels Somaticsniper—SNP calling Varscan – indel/snp/LOH
Now my task is to compare SNP and indels from different software. I need some input as to how to do that. I only have output in VCF format for IndelGenotyper. The rest of the software have the following output format:
IndelGenotyper (VCF):
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT normal tumor
1 177 . A AC . . N_AC=295,310;N_DP=867;N_MM=3.1084745,4.6678634;N_MQ=18.840677,20.062836;N_NQSBQ=26.988136,22.908127;N_NQSMM=0.0111864405,0.13539149;N_SC=12,283,116,441;T_AC=234,239;T_DP=691;T_MM=3.2051282,4.7743363;T_MQ=20.816238,21.225664;T_NQSBQ=27.663815,23.273888;T_NQSMM=0.011976048,0.13911061;T_SC=9,225,85,367 GT:GQ 0/1:0 0/1:0
1 230 . AC A . . N_AC=218,227;N_DP=663;N_MM=4.03211,4.456422;N_MQ=16.825687,20.529816;N_NQSBQ=32.26916,29.760443;N_NQSMM=0.002770083,0.04280557;N_SC=18,200,73,363;T_AC=162,170;T_DP=465;T_MM=4.351852,4.745763;T_MQ=18.141975,21.064407;T_NQSBQ=32.513077,30.064604;T_NQSMM=0.0124533,0.045261122;T_SC=5,157,34,261 GT:GQ 0/1:0 0/1:0
1 247 . TA T . . N_AC=156,162;N_DP=392;N_MM=4.467949,4.3130436;N_MQ=20.666666,19.878262;N_NQSBQ=30.026958,30.963959;N_NQSMM=0.019897304,0.07715736;N_SC=31,125,65,165;T_AC=104,113;T_DP=267;T_MM=4.2115383,4.766234;T_MQ=23.403847,20.603895;T_NQSBQ=31.615385,30.121395;T_NQSMM=0.011538462,0.06525038;T_SC=16,88,24,130 GT:GQ 0/1:0 0/1:0
1 254 . TA T . . N_AC=123,144;N_DP=294;N_MM=4.593496,4.613333;N_MQ=21.04878,21.833334;N_NQSBQ=30.16913,29.987564;N_NQSMM=0.018883415,0.0987564;N_SC=27,96,63,87;T_AC=96,108;T_DP=195;T_MM=4.8854165,4.2988505;T_MQ=24.5625,22.16092;T_NQSBQ=30.090431,29.74282;T_NQSMM=0.027339643,0.086142324;T_SC=19,77,23,64 GT:GQ 0/1:0 0/1:0
1 3820 . TC T . . N_AC=8,8;N_DP=13;N_MM=1.5,1.8;N_MQ=14.5,12.0;N_NQSBQ=33.7125,31.763159;N_NQSMM=0.0,0.0;N_SC=0,8,0,5;T_AC=8,8;T_DP=12;T_MM=1.25,1.25;T_MQ=10.625,13.25;T_NQSBQ=31.725,31.914286;T_NQSMM=0.0,0.028571429;T_SC=0,8,0,4 GT:GQ 0/1:0 0/1:0
1 235278 . C CT . . N_AC=2,2;N_DP=18;N_MM=1.5,0.4375;N_MQ=20.5,12.0625;N_NQSBQ=18.45,33.529034;N_NQSMM=0.0,0.006451613;N_SC=1,1,15,1;T_AC=5,5;T_DP=15;T_MM=1.6,1.1;T_MQ=26.0,26.5;T_NQSBQ=34.2,27.813187;T_NQSMM=0.0,0.054945055;T_SC=5,0,6,4 GT:GQ 0/1:0 0/1:0
somaticsniper:
Chromosome Position Reference base IUB genotype of tumor Somatic Score Tumor Consensus quality Tumor SNV quality Tumor RMS mapping quality Depth in tumor (# of reads crossing the position) Depth in normal (# of reads crossing the position)
1 109 A W 0 228 228 25 2345 3196
1 177 A M 0 228 228 23 1399 1770
1 180 T Y 0 228 228 22 1410 1781
1 250 A M 1 21 21 24 452 598
1 257 A M 1 9 9 25 391 508
1 291 C Y 0 59 176 25 224 312
Bambino:
NormalSample TumorSample Name Chr Pos Type Size Coverage Percent_alternative_allele Chr_Allele Alternative_Allele P-value Text unique_alternative_ids reference_normal_count reference_tumor_count alternative_normal_count alternative_tumor_count count_ref_normal_fwd count_ref_normal_rev count_ref_tumor_fwd count_ref_tumor_rev count_var_normal_fwd count_var_normal_rev count_var_tumor_fwd count_var_tumor_rev alternative_fwd_count alternative_rev_count alternative_bidirectional_confirmation broad_coverage broad_reference_normal_count broad_reference_tumor_count broad_alternative_normal_count broad_alternative_tumor_count unique_alt_read_starts unique_alt_read_starts_fwd unique_alt_read_starts_rev
P7norm.bam P7tum.bam chr1.109 chr1 109 SNP 1 2808 0.30477223 A T 1.0 CCTAACCCTAACCCTAACCC[A/T]ACCCTAACCCTAACCCTAAC 796 1113 810 484 359 570 543 404 406 217 267 170 189 387 456 1 3760 1513 1122 661 464 89 80 70
P7norm.bam P7tum.bam chr1.147 chr1 147 deletion 1 1302 0.343318 C 1.0 445 410 347 284 163 219 191 182 165 20 264 15 148 35 412 1 1513 520 465 333 195 85 27 76
P7norm.bam P7tum.bam chr1.178 chr1 178 insertion 1 1805 0.46371192 C 1.0 837 566 426 477 360 137 429 99 327 19 458 12 348 31 806 1 2491 838 681 544 428 75 25 72
P7norm.bam P7tum.bam chr1.231 chr1 231 deletion 1 1578 0.42712295 C 1.0 674 515 337 387 287 58 457 23 314 25 362 6 281 31 643 1 1819 660 460 398 301 75 25 63
P7norm.bam P7tum.bam chr1.248 chr1 248 deletion 1 814 0.4152334 A 1.0 338 243 183 194 144 31 212 15 168 26 168 12 132 38 300 1 931 324 248 207 152 77 26 66
P7norm.bam P7tum.bam chr1.255 chr1 255 deletion 1 509 0.45972496 A 1.0 234 149 96 123 111 34 115 11 85 14 109 11 100 25 209 1 625 224 147 135 119 69 20 60
P7norm.bam P7tum.bam chr1.304 chr1 304 insertion 1 83 0.25301206 T 1.0 21 44 19 14 7 14 30 3 16 8 6 4 3 12 9 1 380 199 158 16 7 19 12 9
P
Varscan (Indel):
chrom position ref var normal_reads1 normal_reads2 normal_var_freq normal_gt tumor_reads1 tumor_reads2 tumor_var_freq tumor_gt somatic_status variant_p_value somatic_p_value tumor_reads1_plus tumor_reads1_minus tumor_reads2_plus tumor_reads2_minus
1 146 A -C 533 270 33.62% */-C 409 161 28.25% */-C Germline 5.947213588506911E-148 0.9853916902840889 165 244 10 151
1 177 A +C 436 295 40.36% */+C 344 234 40.48% */+C Germline 2.573218751523094E-189 0.5036502499047071 49 295 9 225
1 230 A -C 613 218 26.23% */-C 436 162 27.09% */-C Germline 8.727652916131424E-128 0.381197718905699 36 400 5 157
1 247 T -A 329 156 32.16% */-A 257 104 28.81% */-A Germline 2.9363752902695665E-89 0.869105076258226 31 226 16 88
1 254 T -A 249 123 33.06% */-A 165 96 36.78% */-A Germline 1.2718779840140742E-76 0.18856251770578447 32 133 19 77
1 329 A -C 113 26 18.71% */-C 68 19 21.84% */-C Germline 2.491432783762119E-15 0.34120745355261556 28 40 9 10
Varscan (SNP):
chrom position ref var normal_reads1 normal_reads2 normal_var_freq normal_gt tumor_reads1 tumor_reads2 tumor_var_freq tumor_gt somatic_status variant_p_value somatic_p_value tumor_reads1_plus tumor_reads1_minus tumor_reads2_plus tumor_reads2_minus
1 109 A T 749 295 28.26% W 559 199 26.25% W Germline 1.6489449904910897E-166 0.8400429841827789 352 207 121 78
1 180 T C 463 184 28.44% Y 379 152 28.63% Y Germline 5.019474360557042E-114 0.4973950760340981 49 330 25 127
1 291 C T 50 97 65.99% Y 28 67 70.53% Y Germline 4.350249434195174E-69 0.27609006399506 7 21 32 35
1 297 C T 53 83 61.03% Y 31 59 65.56% Y Germline 5.264922068371216E-58 0.29228247724476564 7 24 27 32
1 303 C T 55 65 54.17% Y 25 50 66.67% Y Germline 5.348385335162754E-46 0.056903610901911816 8 17 24 26
1 309 C T 49 66 57.39% Y 25 48 65.75% Y Germline 4.640899271738838E-46 0.16096023989817923 7 18 21 27
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
i guess comparing SNP should be straight forward as i look at chromosomal position and the SNP letter (from/to) and if that matches then the SNPs match. Am i correct? I have no idea how to do that for indels? Any comments are highly appreciated.
Thank You All.
And let us know how it goes...