I have a vcf file with following data structure
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 2ms01e 2ms02g 2ms03g 2ms04h
2 1738 . A G 4693.24 PASS AC=2;AF=0.250;AN=8;set=Intersection GT:AD:DP:GQ:PB:PC:PG:PI:PL:PW 0|1:389,92:481:99:.,.,.,.,.:1.0:0|1:1020:1748,0,12243:0|1 0/0:318,0:318:99:.:.:0/0:.:0,120,1800:0/0 0|1:270,53:323:99:.,.,.,.,.:1.0:0|1:1258:990,0,9096:0|1 0/0:473,0:473:99:.:.:0/0:.:0,120,1800:0/0
2 1764 . A C 51892.85 PASS AC=5;AF=0.625;AN=8;set=Intersection GT:AD:DP:GQ:PB:PC:PG:PI:PL:PW 1|0:102,415:517:99:.,.,.,.,.:1.0:1|0:1020:12332,0,2817:1|0 1/1:0,356:356:99:.:.:1/1:.:12587,1069,0:1/1 1|0:65,301:366:99:.,.,.,.,.:1.0:1|0:1258:9337,0,1279:1|0 0/1:281,353:634:99:.:.:0/1:.:10325,0,7548:0/1
2 1921 . T C 4465.03 PASS AC=0;AF=0.00;AN=6;set=Intersection GT:AD:DP:GQ:PG:PL:PW 0/0:1,0:1:3:0/0:0,3,35:0/0 ./.:0,0:0:.:./.:0,0,0:./. 0/0:1,0:1:3:0/0:0,3,39:0/0 0/0:2,0:2:6:0/0:0,6,80:0/0
Problem: The number of fields
in the FORMAT
column (9th column, 8th column python based) isn't the same for all the lines.
I want to read this file and mine values from specific tags like GT
, PI
and PG
. But, all these tags are not present in all the lines; in such cases I just want to the values to be default '.'
So, the file output would have following structure:
contig pos ref alt_My freq_My GT PI PG
2 1764 A C 0.250 1|0 1020 1|0
2 1921 T C 0.00 0/0 . 0/0
I am using pyVCF module to read the file to extract these information. Below is my script:
import vcf;
vcf1_data = vcf.Reader(open('MY.phased_variants.Final_sub.vcf', 'r'))
for record in vcf1_data:
contig1 = record.CHROM
pos1 = record.POS
ref_allele1 = record.REF
alt_alleles1 = ",".join(map(str, (record.ALT[::])))
alt_freq1 = ",".join(map(str, record.INFO['AF'])))
Now, I write these called values to an output
text file as:
output = open("My_allele_table.txt", "a")
output.write("{}\t{}\t{}\t{}\t{}"
.format(contig1, pos1, ref_allele1, all_alleles1, all_freq1))
Additionally, I append other values to the output file. But, when doing so I get AttributeError
since PI
field is not present in all the line.
Incomplete solution: I added exception
to the error
but then it just skips reading through the end of the line.
for sample in record.samples:
try:
output.write("\t{}\t{}\t{}".format(sample['GT'], sample['PI'], sample['PG']))
except AttributeError:
continue
output.write('\n')
Any help appreciated !
Why can't simply
if sample['PI'] ... else ...
?@Goutham: Can you please provide more details on your code.