I want to use IVAS for sQTL analysis and it accepts only allelic encoding of genotypes, so that they should be two letters of A,C,G,T
The format of my vcf file is like this:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 108 139 159 265
1 73 . C A 40 PASS . GT:DP:GQ 0|0:5:40 0|0:9:40 0|0:6:38 ./.:.:.
1 83 . T C,A 40 PASS . GT:DP:GQ 1|1:5:40 1|1:9:40 0|0:8:38 ./.:.:.
I want to convert the genotype format from numeric to letters
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 108 139 159 265
1 73 . C A 40 PASS . GT:DP:GQ CC CC CA NA
1 83 . T C,A 40 PASS . GT:DP:GQ TC TC TT NA
Many thanks, the genotype format of my vcf file is like this 0|0, 1|1, 2|2, ./. so I replaced
if gt == '0|0' or gt == '0/0':
letter = ref*2
elif gt == '0|1'or gt == '0/1':
letter = ref+alt
elif gt == '1|1'or gt == '1/1':
letter = alt*2
by
if gt == '0|0' or gt == '0/0':
letter = ref*2
elif gt == '1|1'or gt == '1/1':
letter = ref+alt
elif gt == '2|2'or gt == '2/2':
letter = alt*2
It needs parenthesis in print statement so I added parenthesis
(print 'only accept BIALLELIC SNP\nremove this site\n')
But I am getting this error:
Traceback (most recent call last):
File "letter.py", line 76, in <module>
Letter(vcf).write_out()
File "letter.py", line 70, in write_out
f.write(''.join(self.vcfhead))
File "/home/waqas/miniconda3/lib/python3.6/gzip.py", line 260, in write
data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'
Actually, it is running in py2 not py3. By the way, it is a little weird. Is standard format of your vcf ? If so, I hope my code will resolve your problem without change. But if you just want to use 0 represent refHOM ,1 represent HET, 2 represent altHOM, i think you will meet some problems in data format when use the script.
Many thanks for your response, I have downloaded vcf file from here. |In vcf file the genotype fromats are like this:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 108 139 159 265
1 73 . C A 40 PASS . GT:DP:GQ 0|0:5:40 0|0:9:40 0|0:6:38 ./.:.:.
1 83 . T C,A 40 PASS . GT:DP:GQ 1|1:5:40 1|1:9:40 0|0:8:38 ./.:.:.
I was thinking if your code tries to find 1|0 or 0|1 and tries to replace it with het as I tried to grep 0|1 and 1|0 from vcf file and did not get any count.
I have a similar problem, where I have downloaded a file that seems to be in phased nucleotide format (A|A, A|G, G|G, etc). How could I modify this script to convert it back based into the ref and alt of to have phased numeric values (0|0, 0|1, 1|1)?
I feel like it would need a lot of modifying to determine which of the nucleotide matches ref or alt and then convert accordingly?
Many thanks, the genotype format of my vcf file is like this 0|0, 1|1, 2|2, ./. so I replaced
by
It needs parenthesis in print statement so I added parenthesis
But I am getting this error:
Actually, it is running in py2 not py3. By the way, it is a little weird. Is standard format of your vcf ? If so, I hope my code will resolve your problem without change. But if you just want to use 0 represent refHOM ,1 represent HET, 2 represent altHOM, i think you will meet some problems in data format when use the script.
Many thanks for your response, I have downloaded vcf file from here. |In vcf file the genotype fromats are like this:
I was thinking if your code tries to find 1|0 or 0|1 and tries to replace it with het as I tried to grep 0|1 and 1|0 from vcf file and did not get any count.
[vcf format]https://faculty.washington.edu/browning/beagle/intro-to-vcf.html.It is useful.