Hi all,
I’m going to extract all variants within some specific genes from 1K genome project and gnomAD, so to get their genomic coordinates, I used the main GTF annotation file for human (version 28, GRCh 38) from GENCODE and made a bed file for the genes of interest. Here is the bed file for one of my genes (ISG15), as you can see, there are multiple records for the same gene. I think this redundancy is related to different transcripts of the same gene, yes is it right? However, I’m confused which start and end positions should be used for variant extraction; could you please help me out?
chr1 1001137 1014541 ENSG00000187608.8 gene_name ISG15 . +
chr1 1001137 1001281 ENSG00000187608.8 gene_name ISG15 . +
chr1 1008193 1008279 ENSG00000187608.8 gene_name ISG15 . +
chr1 1013983 1014541 ENSG00000187608.8 gene_name ISG15 . +
chr1 1014004 1014475 ENSG00000187608.8 gene_name ISG15 . +
chr1 1014004 1014007 ENSG00000187608.8 gene_name ISG15 . +
chr1 1014475 1014478 ENSG00000187608.8 gene_name ISG15 . +
chr1 1001137 1001281 ENSG00000187608.8 gene_name ISG15 . +
chr1 1008193 1008279 ENSG00000187608.8 gene_name ISG15 . +
chr1 1013983 1014004 ENSG00000187608.8 gene_name ISG15 . +
chr1 1014475 1014541 ENSG00000187608.8 gene_name ISG15 . +
chr1 1001144 1014435 ENSG00000187608.8 gene_name ISG15 . +
chr1 1001144 1001263 ENSG00000187608.8 gene_name ISG15 . +
chr1 1008193 1008279 ENSG00000187608.8 gene_name ISG15 . +
chr1 1013983 1014435 ENSG00000187608.8 gene_name ISG15 . +
chr1 1014004 1014435 ENSG00000187608.8 gene_name ISG15 . +
chr1 1014004 1014007 ENSG00000187608.8 gene_name ISG15 . +
chr1 1001144 1001263 ENSG00000187608.8 gene_name ISG15 . +
chr1 1008193 1008279 ENSG00000187608.8 gene_name ISG15 . +
chr1 1013983 1014004 ENSG00000187608.8 gene_name ISG15 . +
chr1 1013422 1014540 ENSG00000187608.8 gene_name ISG15 . +
chr1 1013422 1013576 ENSG00000187608.8 gene_name ISG15 . +
chr1 1013573 1013576 ENSG00000187608.8 gene_name ISG15 . +
chr1 1013573 1013576 ENSG00000187608.8 gene_name ISG15 . +
chr1 1013983 1014540 ENSG00000187608.8 gene_name ISG15 . +
chr1 1013983 1014475 ENSG00000187608.8 gene_name ISG15 . +
chr1 1014475 1014478 ENSG00000187608.8 gene_name ISG15 . +
chr1 1013422 1013573 ENSG00000187608.8 gene_name ISG15 . +
chr1 1014475 1014540 ENSG00000187608.8 gene_name ISG15 . +
Also, for including promoter region and getting variants in this region, please kindly tell me should I add about 200 or 500 bp (which one do you suggest?) to the starting position of genes? Please share me if there is any point for more consideration.
Thanks in advance
although it is late, but I just wanted to point your attention to the genome build for the variants, make sure the gnomAD variants are not aligned to hg19/GRCh37 where you are using annotations from GRCh38.