I have VCF file and part of it looks like this:
;Gene.refGene=NONE,DDX11L1;GeneDetail.refGene=dist\x3dNONE\x3bdist\x3d1826;ExonicFunc.refGene=.;AAChange.refGene=.;
I need to extract Gene.refGene=NONE,DDX11L1
which is between semicolons, I also need to extract ExonicFunc.refGene=.
and AAChange.refGene=.
which are all also between semicolons.
I tried to do it like this:
import sys
import re
def parse_vcf(vcf_file):
pattern=re.compile(r'"([^;]*)"' , 'Gene.refGene')
f=open(vcf_file , 'r')
for line in f:
if pattern.search(line):
continue
return
if __name__ == '__main__':
vcf=sys.argv[1]
parse_vcf(vcf)
but it is not working. thank you for your suggestions.``
Thank you so much it is perfect! And I understand, thank you for comments also!!