Hi,
I'm trying to annotate genomic intervals with their corresponding genes in R. My code:
db = TxDb.Hsapiens.UCSC.hg38.knownGene
bed = import.bed(bed_path)
genes = transcriptsByOverlaps(ranges = bed, x = db, maxgap=0, columns='gene_id')
I get for each interval the overlapping part to each gene. I want to get the original interval and the corresponding genes.
For example if my BED file look like this:
chr1 152078793 156262976
I want to get:
chr1 152078793 156262976 gene1
chr1 152078793 156262976 gene2
chr1 152078793 156262976 gene3
and not:
chr1 152079557 152080903 gene1
chr1 152084141 152089064 gene2
Like bedtools intersect with the -wo
parameter.
Thank you!
You can simply do left join the two table (input bed file and output bed file)
Can use left_join() function of dplyr package. e.g.
Make sure to join with chr,st,end as they three together form one information. Make column names same in both table.
Thank you! But I'm not sure how it will solve the problem. I have my original bed file, and the output by the function
transcriptByOverlap
, but the columns are not identical, just overlapping. What I could Do is intersect between both tables and get the original intervals , which I'm not sure how to do..Right, left join does not work for this problem.