I have a BED file containing sequencing gaps
17 41223034 41223081
13 32929236 32929276
17 7574019 7574057
I would like to annotate this BED file with gene symbol, transcript id (RefSeq) and map the genomic coordinates to the CDS.
I have been trying to use GenomicFeatures in R to do this, and have got part way.
# import BED as GRanges object
bedfile <- rtracklayer::import.bed('./data/test.bed')
# import refseq transcripts as TxDB object
refseq_txdb <- makeTxDbFromUCSC(genome="hg19", tablename="ncbiRefSeq")
# split cds by transcript
cds <- cdsBy(x = refseq_txdb, by = c("tx"), use.names=TRUE)
# map between genomic and cds coordinate system
bed2cds <- mapToTranscripts(x = bedfile, transcripts = cds, intronJunctions=TRUE)
This gives me a GRanges object that has the RefSeq IDs and CDS coordinates. However, I cannot easily find a way to update each entry of this object with Gene ID (Symbol) and Exon Number. And advice greatly appreciated.
GRanges object with 8 ranges and 3 metadata columns:
seqnames ranges strand | xHits transcriptsHits intronic
<Rle> <IRanges> <Rle> | <integer> <integer> <logical>
[1] NM_007298.3 1538-1584 - | 1 43630 FALSE
[2] NM_007294.4 4850-4896 - | 1 43631 FALSE
[3] NM_007297.4 4709-4755 - | 1 43632 FALSE
[4] NM_007299.4 1538-1584 - | 1 43633 FALSE
[5] NM_007300.4 4913-4959 - | 1 43634 FALSE
[6] NM_000059.3 7247-7286 + | 2 35159 FALSE
[7] NM_001276697.2 -1986--1987 - | 3 43034 FALSE
[8] NM_001126113.2 1042-1041 - | 3 43046 TRUE
-------