Error while parsing vcf file
0
0
Entering edit mode
7.6 years ago
a.james ▴ 240

Hello,

I am trying to convert my multiple vcf file into a tabular format. And I am having error while parsing through each row of the file here is part of the script,

 for files in results:
       Va_BM = vcf.Reader(filename=files, compressed=False) 
 for variant in Va_BM:
            tumor_REL        = variant.samples[0]
            normal_ID       = variant.samples[1]

The error, is ,

ValueError                                Traceback (most recent call last)
/home/usr/Tools/anaconda3/lib/python3.4/site-packages/vcf/parser.py in _parse_samples(self, samples, samp_fmt, site)
    464                         try:
--> 465                             sampdat[i] = int(vals)
    466                         except ValueError:

ValueError: invalid literal for int() with base 10: '86,2'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-131-857e0813455d> in <module>()
      9 
     10     #  iterating lines in VCF file (one line = one variant)
---> 11     for variant in Va_BM:
     12         tumor_REL        = variant.samples[0]
     13         normal_ID       = variant.samples[1]

/home/usr/Tools/anaconda3/lib/python3.4/site-packages/vcf/parser.py in __next__(self)
    565 
    566         if fmt is not None:
--> 567             samples = self._parse_samples(row[9:], fmt, record)
    568             record.samples = samples
    569

Dont understand where I am going wrong, the vcf file is from Mutect2 and file format is ##fileformat=VCFv4.2 any help would be great. Thank you

RNA-Seq sequencing SNP vcf • 3.3k views
ADD COMMENT
0
Entering edit mode

search your VCF for the word '86,2' in the genotypes . There is a problem with your vcf at this point. Your parser expects an integer.

ADD REPLY
0
Entering edit mode

Thats the first line of my vcf file I mean after header, and I am wondering 82, 2 is in int format isnt?

ADD REPLY
0
Entering edit mode

why don't you show us the whole line ?????

ADD REPLY
0
Entering edit mode

Here is the whole line which complaints,

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  TUMOR   NORMAL

1   14748   .   G   C   .   alt_allele_in_normal;t_lod_fstar    ECNT=1;HCNT=4;MAX_ED=.;MIN_ED=.;NLOD=29.34;TLOD=4.58;CSQ=C|non_coding_transcript_exon_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000423562|unprocessed_pseudogene|10/10||||1284|||||||-1||HGNC|38034|||||||||||||||||||||||||||||,C|non_coding_transcript_exon_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000438504|unprocessed_pseudogene|12/12||||1398|||||||-1||HGNC|38034|||||||||||||||||||||||||||||,C|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000450305|transcribed_unprocessed_pseudogene|||||||||||1078|1||HGNC|37102|||||||||||||||||||||||||||||,C|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000456328|processed_transcript|||||||||||339|1||HGNC|37102|||||||||||||||||||||||||||||,C|intron_variant&non_coding_transcript_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000488147|unprocessed_pseudogene||10/10||||||||||-1||HGNC|38034|||||||||||||||||||||||||||||,C|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000515242|transcribed_unprocessed_pseudogene|||||||||||336|1||HGNC|37102|||||||||||||||||||||||||||||,C|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000518655|transcribed_unprocessed_pseudogene|||||||||||339|1||HGNC|37102|||||||||||||||||||||||||||||,C|intron_variant&non_coding_transcript_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000538476|unprocessed_pseudogene||12/12||||||||||-1||HGNC|38034|||||||||||||||||||||||||||||,C|non_coding_transcript_exon_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000541675|unprocessed_pseudogene|9/9||||1031|||||||-1||HGNC|38034|||||||||||||||||||||||||||||    GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:`86,2`:0.027:1:1:0.500:2974,76:43:43  0/0:176,5:0.032:3:2:0.600:6048,155:92:84
ADD REPLY
0
Entering edit mode

you put the quote around 86,2 to highlight the number don't you ? or there are really some quotes in the VCF file (and it's the error)

ADD REPLY
0
Entering edit mode

No, I just highlighted it for bio-stars. There is no quote in real vcf file.

ADD REPLY
0
Entering edit mode

@Pierre Lindenbaum, apparently the issue is with vcf.reader parser doesn't suit for vcf 4.2 files.

ADD REPLY

Login before adding your answer.

Traffic: 1678 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6