Entering edit mode
20 months ago
kamanovae
▴
100
Hi!
I created a vcf file based on rsId and information from dbsnp. Then was ran the bcftools norm to make the file suitable for GATK.
bcftools norm -f ref/unmasked_ref.fa --check-ref w -m - rs_coord.hear_color_with_rs.vcf
But I still have "-" in the ALT column. I can't start indexing the file and I think that GATK will not process such lines.
How can I cover such lines from this:
chr22 42127856 rs5030867 T G . . RS=rs5030867
chr22 42128173 rs5030656 C CCTT . . RS=rs5030656
chr22 42128176 rs5030656 TCT - . . RS=rs5030656
chr22 42128242 rs35742686 T - . . RS=rs35742686
chr22 42128945 rs3892097 C A . . RS=rs3892097
chr22 42128945 rs3892097 C G . . RS=rs3892097
chr22 42128945 rs3892097 C T . . RS=rs3892097
chr22 42129084 rs5030655 A - . . RS=rs5030655
chr22 42130692 rs1065852 G A . . RS=rs1065852
chr22 42130692 rs1065852 G C . . RS=rs1065852
To this (add nucleotide from reference before alt allel):
chr22 42129084 rs5030655 TA T . . RS=rs5030655
Have you tried
--check-ref s
? From bcftools manual, it looks likew
just reports a warning, without attempting to fix the reference allele ass
does.Unfortunately this approach doesn't work, because the REF alleles are fine. I didn't find a flag that could fix ALT
Right, I misread the question, sorry about that :). FYI, It seems that the issue was somehow addressed in this stackexchange post, have you tried the solutions suggested there?