Hello Everyone,
I've got an output file from RnBeads, an R package, with differentially methylated regions. None of these regions actually come with gene IDs. I'm trying to use the BEDOPS closest feature to get the closest gene ID in comparison to my dataset excel file. I'm unable to do this successfully. Everytime I run the closest features command, the output file I get shows that none of my locations fell within the reference file? I'm assuming this since within the last column, every row ends with "|NA".
My question is, what am I doing wrong? Are my files in the wrong format? Did I download the correct file from ensembl? why are none of my regions of interest returning the closest region?
The files below are available at this dropbox folder: https://www.dropbox.com/sh/xa43onjwg3d7npv/AAAk8Wzwe-rj95eDrdHBuHona?dl=0
tiling_fourm.xlsx
<-- This excel file is what I'm trying to get the closest gene locations of.tiling_forum.bed
<-- Removed the first row (headings). I've taken the chromosome, start, and end column, saved it as a tab delimited txt file and changed the extension to .bedhg19.bed
<-- this is the reference file (analogous to the<inputfile>
within the bedops closest features user guide page. This file I've created by going to https://genome.ucsc.edu/cgi-bin/hgTables and then selecting the following features: Clade: Human, genome: Human, assembly: feb2009grch37/hg19, group: mapping and sequencing, Track: UCSC Genes, table:known gene, region: genome, output format: select fields from primary and related tables. Then the fields selected were Chrom, chrinstart and chromend.outputforum.txt
<-- this is the result file I get after running the following cmd:closest-features --closest hg19.bed tiling_forum.bed > outputforum.txt
.
As you can see the file that's produced contains the last column with values that end in "|NA" as in not mapped? I'm assuming?
If you anyone could help me out, it would be greatly appreciated.
Thanks
Thanks.