Entering edit mode
5.1 years ago
rthapa
▴
90
Hi,
I want to estimate SNP density per gene (>20,000 genes). I do have a gff file and a vcf file with SNPs position. I am trying to use the following python script but couldn't get the density. I am new to python so there could be something wrong with my script. Does anyone have any suggestions?
Thanks
import allel
allel.__version__
geneset = allel.gff3_to_dataframe('gene.gff3', attributes=['Name'])
geneset.head()
genes = geneset[geneset['type'] == 'gene']
callset = allel.read_vcf("SAP.vcf")
gt = allel.GenotypeArray(callset['calldata/GT'])
import numpy as np
import numcodecs
import sys
n_genes = len(genes)
for i, (_, gene) in genes.iterrows():
if geneset[0] == 'Chr01'
try:
loc_gene = pos.locate_range(gene.start, gene.end)
except KeyError:
pass
else:
n_variants[i] = loc_gene.stop - loc_gene.start