update genotypes in vcf
0
0
Entering edit mode
5.9 years ago
biosol ▴ 170

Hi all,

Recently, I've had some trouble with a VCF file in which a program had turn around my REF and ALT fields (e.g. REF=C, ALT=T, but should be REF=T, ALT=C) and also their related genotypes.

I've managed to turn around the REF and ALT fields, but I'm not able to update my genotypes in an efficient way. Right now, I'm using this command:

rs=$(cat listSNPs1.txt)
for i in $rs; do
        echo "SNP: $i"
        sed -i -e "/$i/{s/0\/0/2\/2/g; s/1\/1/0\/0/g; s/2\/2/1\/1/g}" myfile.vcf
done

listSNPs1.txt contains a list of the SNPs to be updated (one column, each line an rs). With this 0/0 is updated to 2/2, 1/1 is updated to 0/0, and 2/2 is finally updated to 1/1. This program is actually working, but it's gonna take ages, as I have more than 100.000 SNPs to update.

Does anyone know a better alternative? Maybe using vcftoolsor plinkbut, for the moment, I haven't been able to find any parameter that suits my necessities. Thank you very much in advanced :)

vcf genotypes • 2.0k views
ADD COMMENT
1
Entering edit mode

Maybe python should be a better option here imho. You do not have to read your vcf file for each SNPs in listSNPs1.txt, this is time consuming. Better to make a dictionnary with your SNPs then read your vcf file line by line and check if it exists in your dictionnary

ADD REPLY
0
Entering edit mode

Okey, right, thanks :)! But then, how do I change the genotypes? Still with my sed command?? Or somehow with Python?

ADD REPLY
1
Entering edit mode

You can use some regex in python too with re package. As you don't have many type of genotype, you can also write some if/else statements

ADD REPLY
0
Entering edit mode

Okey, I'll try that, thanks!

ADD REPLY
0
Entering edit mode

If your vcf file only have SNPs (or the REF/ALT swap is just a problem for SNPs) you can use bcftoolsto fix it.

$ bcftools +fixref input.vcf -- -f reference.fa -m flip > output.vcf
ADD REPLY

Login before adding your answer.

Traffic: 2365 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6