How do I get PyVCF to give me an error if the record it is fetching doesn't exist?
0
0
Entering edit mode
5.4 years ago
devenvyas ▴ 760

I have a Python script that is supposed to be reading some emit-all VCF files and outputting a transposed Plink file for biallelic sites. The VCFs were called using custom in-house callers and thus are slightly out of spec (since they are ancient DNA), so standard programs like vcftools or bcftools are of no use. Each individual is in their own VCF.

I am having some issues particularly when one or more individuals is missing a record line altogether (i.e., some VCFs are missing a line for chr21 base 9412435). In these cases, the script just moves on the next individual in the loop for the given site. Here is an excerpt of the tped. There are 24 individuals, so there should be 48 allele calls (or 0s). Note how the first two have the appropriate number of genotypes, but the last four don't.

21 21:9412269 0 9412269 A A A A A A A A 0 0 T T 0 0 A A 0 0 0 0 0 0 0 0 A A A A T T 0 0 T T T T A A A A A A A A A A T T
21 21:9412298 0 9412298 T T T T T T T T T T T T 0 0 T T T T 0 0 T T T T T T T T T T 0 0 T T T T T T T T T T T T T T T T
21 21:9412357 0 9412357 C C 0 0 0 0 C C C C C C 0 0 0 0 C C 0 0 0 0 0 0 C C 0 0 C C C C C C C C C C C C C C A A C C
21 21:9412435 0 9412435 0 0 0 0 0 0 0 0 T T 0 0 0 0 0 0 0 0 T T 0 0 0 0 0 0 0 0
21 21:9412498 0 9412498 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G G 0 0 0 0 0 0 G G 0 0 0 0 0 0 G G 0 0 G G G G
21 21:9412503 0 9412503 0 0 0 0 C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C C 0 0 C C C C

I am trying to find a way to get pyvcf to do something along the lines of this:

for record in file_read[h].fetch(chromosome,pos-1,pos):
    try:
        if record "exists":
            ##figure out what genotype to add
        elif record "doesn't exist"
            outD=outD+' 0 0'

Anyone have any suggestions?

vcf • 1.0k views
ADD COMMENT
0
Entering edit mode

Hello,

with the code you are showing, it isn't clear to me how you parse your data. Can you please show some more?

In general: If you know how much column you expect check this for each line and handle differences appropriate.

Also using pandas dataframe might be useful in your case.

fin swimmer

ADD REPLY

Login before adding your answer.

Traffic: 1673 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6