Gene Location File - One Gene One Genomic Interval
2
0
Entering edit mode
4.0 years ago
jon.klonowski ▴ 210

I am trying to make a gene location file: Gene_Name Chromosome Start End

So that I can run a program that maps SNPs to genes for burden analysis. My problem is that UCSC table output gives transcripts, and multiple transcripts can be attributed to a single gene. How do I consolidate things so I can end up with a "one gene one genomic interval" model.

bests,

JFK

genomics genome sequencing SNP burden • 846 views
ADD COMMENT
0
Entering edit mode
4.0 years ago
wulj2 ▴ 50

interesting problem to solve, but i think it will be easy

ADD COMMENT
0
Entering edit mode
4.0 years ago
jon.klonowski ▴ 210

My lab mate sent me some R script:

library(GenomicFeatures)
GENCODE_FILE.GR38 = "/data/projects/annotation/GENCODE/rel29/gencode.v29.annotation.gff3.gz"
#------------ 
# GET TSS FOR PROTEIN CODING GENES
# filter for protein coding transcripts
db = makeTxDbFromGFF(GENCODE_FILE.GR38, format=c("gff3"))
transcripts = transcripts(db, columns=c("tx_id", "tx_name"))
genes = genes(db, columns=c(“tx_id”, “tx_name”))

Where I got the most recent gencode release: https://www.gencodegenes.org/human/

ADD COMMENT

Login before adding your answer.

Traffic: 2637 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6