Entering edit mode
5.2 years ago
nkausthu
▴
30
I have following records in one of the gvcf file
1 3753032 . GTTTT G,GT,GTTT,GTTTTT,GTTTTTT,GTTTTTTTT,<NON_REF>
1 10502954 . CTTTTT C,CT,CTTT,CTTTT,CTTTTTT,<NON_REF>
1 11272829 . T <NON_REF>
1 11272839 . G <NON_REF>
1 15978128 . T <NON_REF>
1 15978129 . T <NON_REF>
1 38332078 . T TCA,TCTCA,TCACACACACACACACACA,TC,<NON_REF>
1 67725648 . GAAAA G,GAA,GAAA,GAAAAA,GAAAAAA,<NON_REF>
1 72748277 . ATT A,AT,ATTT,ATTTT,ATTTTT,ATTTTTT,<NON_REF>
1 150782110 . CAAAAA C,CA,CAA,CAAA,CAAAA,<NON_REF>
1 155724315 . GTT G,GT,TTT,GTTTTT,GTTTTTT,<NON_REF>
1 158058266 . CTTTTTT C,CT,CTT,CTTT,CTTTT,CTTTTT,<NON_REF>
1 201082902 . C CAA,CAAA,<NON_REF>
1 212618993 . A C,<NON_REF>
1 237955682 . C CGTGT,CGTGTGT,<NON_REF>
2 27532239 . CAAA C,CA,CAA,CAAAAAAAAAAAAAAAAA,<NON_REF>
2 47641559 . TAAAAAA T,TA,TAA,TAAA,TAAAA,TAAAAA,<NON_REF>
2 100058714 . CAA C,CA,CAAA,CAAAA,CAAAAAAA,CAAAAAAAA,<NON_REF>
2 113303450 . T <NON_REF>
2 113303451 . G <NON_REF>
2 207998878 . AT A,ATT,ATTT,ATTTT,ATTTTT,ATTTTTT,<NON_REF>
2 231333532 . CAAA C,CAA,CAAAAAAA,<NON_REF>
3 42734487 . G <NON_REF>
3 42734750 . C A,<NON_REF>
3 42734751 . C <NON_REF>
3 47484723 . TACACACAC T,TAC,<NON_REF>
when we have done left normalization using bcftools after joint genotyping, lots of false heterozygous calls has been generated with no reads supporting the altered allele as follows
0/1:14,0:42:72:149,0,164
0/1:10,0:35:17:88,0,111
I guess it's due to incorrect splitting of multi-alleles. It would be great if anyone can suggest ways to remove these variants from downstream vcf file ?
Hello,
please provide an example dataset one can use directly for testing.
Thanks!
fin swimmer
I can provide vcf file after left normalization is that sufficient?
Hello,
that's better then nothing. But the input vcf would be more useful. Reduce it to some example lines that show your problem.