I will need to annotate genes from the coordinates of CNV I have in a VCF file. I am a bit new doing this so I explain here my approach and problem. Happy to receive feedback from both things
I have downloaded a gff file from here http://ftp.ensembl.org/pub/release-105/gtf/homo_sapiens/ Homo_sapiens.GRCh38.105.chr.gtf contains all gene names and coordinates.
Following the gffutils documentation , I need to convert this file into a db. Here is where I am blocked.
The first lines of my line are like this
1 ensembl_havana gene 1211340 1214153 . - . gene_id "ENSG00000186827"; gene_version "11"; gene_name "TNFRSF4"; gene_source "ensembl_havana"; gene_biotype "protein_coding";
1 ensembl_havana transcript 1211340 1214153 . - . gene_id "ENSG00000186827"; gene_version "11"; transcript_id "ENST00000379236"; transcript_version "4"; gene_name "TNFRSF4"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "TNFRSF4-201"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS11"; tag "basic"; transcript_support_level "1 (assigned to previous version 3);
Gene name is what I need and it is in the last column (attributes). Documentation says
`
>>> fn = gffutils.example_filename('ensembl_gtf.txt')
>>> db = gffutils.create_db(fn, ":memory:",
... id_spec={'gene': 'gene_id', 'transcript': "transcript_id"},
... merge_strategy="create_unique",
... transform=transform_func,
... keep_order=True)
`
I don't get any error but when I try to find the gene by the name
db["TNFRSF4"]
I get this error
FeatureNotFoundError Traceback (most recent call last)
<ipython-input-49-5186f0b4b314> in <module> ----> 1 db["TNFRSF4"]
~/opt/anaconda3/envs/RP/lib/python3.6/site-packages/gffutils/interface.py in __getitem__(self, key) 278 # TODO: raise error if more than one key is found 279 if results is None: --> 280 raise FeatureNotFoundError(key) 281 return self._feature_returner(**results) 282
FeatureNotFoundError: TNFRSF4
Many thanks for your help. I haven't tried it yet because (as I have asked here) thinking better what I want, this will be implemented in Windows machines and your approach will complicate the application I want to develop. I will use it for validation and I will let you know.
By that way, I didn't know convert2bed, I have been reading your git repo a bit and looks great! I will use it in future code