My model system uses a reference genome from a population that is likely highly derived. I have whole genome sequencing data from two other pops, one (we can call PopB) is likely derived from the other (PopA), and both sets of reads were aligned to the reference. In short I want to take a set of polymorphisms that are fixed/nearly fixed in PopA (that are absent in the reference) and flip which allele is considered the reference in both sets of data. Multiple tests I want to do require the identification ancestral state/assumes ref=ancestral which is not the case in about 15% of the variants. I was able to simply make a list of Chromosome/positions that I want to flip the ref/alt definitions but I don't know of a way to use this list to actually flip them. I'd appreciate any help.
Hello mdstep,
if you have already a list of positions and REF/ALT alleles you want to switch, you could try something like this:
bcftools annotate
to fill the ID column in the vcf file, you want to switch REF/ALTbcftools +fixref file.bcf -Ob -o out.bcf -- -i List_of_1.vcf.gz
to switch REF/ALT based on the IDfin swimmer
I cant find too much documentation on this flipping from a list, the steps here are not super clear. I know how to flip ref alt in python, but I would like to be able to do it in bcftools for speed.
annotate_file_example.gz:
The run:
bcftools +fixref big_vcf_with_unflipped_aleles.vcf.gz -- -i annotate_file_example.gz > big_vcf_with_unflipped_aleles.vcf
I just get
Expected the -f option
Sorry, I didn't specify, it's all in VCF format.
Did you find any solution for this problem?
I'm in a similar situation. In my vcf file, I need to switch REF and ALT for some sites. This can be done by a script, but I was wondering if there is any bcftools function that con do that.
Thanks!