I'm trying to use VCF tools to merge two large VCF files for the same species. The problem I'm having is that the order of diploid calls for the reference is sometimes different between the two files (e.g. at pos 3445083 in 1 file the reference call is C T, whereas at pos 3445083 in the other VCF the reference call is T C). This is throwing VCF tools off when I attempt to merge despite the fact the reference calls are actually the same, with the error:
The reference prefixes do not agree: C vs T Failed on line 429790:4074186
Anyone else have this problem, or know of a workaround?
Any help is appreciated. Thanks!
I don't understand, are the reference sequences used to call variants different in the two VCFs?
No, they are the same, they are just in a different order. I one file C is before T, while in the other T is before C (yet the reference is diploid for C and T in either case).
it cannot be in a different order unless you've been using a buggy software . The colum, REF MUST be the reference allele.
Right, in one file the column REF has C T, while in the other REF reads T C for the same CHROM and POS. Note that I did not run this data through my own pipeline (although I wish I had). I am dealing with two VCF files that were created at two different times (using the same reference), and I want to combine them but am getting the error mentioned above. I believe the reads were originally aligned using BWA.